This application claims the benefit of priority to Japanese Patent Application Number 2020-079478 filed on Apr. 28, 2020. The entire contents of the above-identified application are hereby incorporated by reference.
The present disclosure relates to a model evaluating device, a model evaluation method, and a program for optimizing a prediction model.
Prediction models configured to perform machine learning and generate predicted values of target variables for explanatory variables have been proposed. For example, JP 2020-27556 A discloses a device that performs machine learning. The device is configured to learn using training data including status data and control condition data and output recommended control condition data (a target variable) indicating a recommended control condition for each target device in response to an input of the status data (an explanatory variable).
Incidentally, in recent years, metamorphic testing (MT) has been proposed as a method for evaluating systems. In MT, the system is evaluated using a relationship called metamorphic relations (MR). MR is a relationship in which a change in output data when a predetermined change is applied to input data is known. For example, a relationship indicating that the calculation result of the value of sin(π) and the calculation result of the value of sin(π+2π) are the same is also MR.
In prediction models using machine learning in the related art, robustness may not be ensured due to bias in training data. For example, in a case where a prediction model is applied to a plant having varying characteristics depending on the outdoor temperature, if the prediction model is taught only with training data acquired in summer, prediction accuracy in winter may decrease. For example, component degradation of the plant may change performance, which may decrease prediction accuracy. In addition, due to individual differences in components and differences in fuel properties, operating conditions may deviate from the operating conditions at the time of learning, and the prediction accuracy may decrease.
In this way, in a case where robustness cannot be ensured, the prediction model may be refrained from being applied to an actual machine, and the prediction model may be used with doubts about prediction results. For this reason, a technique has also been proposed in which new data is generated using MR to complement the bias in the training data, and the MT is executed using the data. For example, in image recognition, a method of generating MR data obtained by rotating training data and executing MT has been proposed. However, such newly generated MR data may also include data that cannot actually occur. Thus, to ensure the reliability of the prediction model, the robustness of the prediction model needs to be more appropriately evaluated.
In view of the above-described circumstances, an object of the present disclosure is to provide a model evaluating device, a model evaluation method, and a program capable of more appropriately evaluating the robustness of a prediction model.
According to the present disclosure, there is provided a model evaluating device that evaluates performance of a prediction model configured to generate predicted values of a target variable for an explanatory variable. The model evaluating device includes: a generating unit configured to generate expanded MR data by transforming evaluation data; and an evaluating unit configured to evaluate the performance of the prediction model based on a first evaluation score indicating accuracy of a first predicted value generated by the prediction model based on the evaluation data, and a second evaluation score indicating accuracy of a second predicted value generated by the prediction model based on the MR data.
According to the present disclosure, there is provided a model evaluation method for evaluating performance of a prediction model configured to generate predicted values of a target variable for an explanatory variable. The model evaluation method includes: generating expanded MR data by transforming evaluation data; and evaluating the performance of the prediction model based on a first evaluation score indicating accuracy of a first predicted value generated by the prediction model based on the evaluation data, and a second evaluation score indicating accuracy of a second predicted value generated by the prediction model based on the MR data.
According to the present disclosure, there is provided a program for causing a computer to evaluate performance of a prediction model configured to generate predicted values of a target variable for an explanatory variable. The program causes the computer to execute: generating expanded MR data by transforming evaluation data; and evaluating the performance of the prediction model based on a first evaluation score indicating accuracy of a first predicted value generated by the prediction model based on the evaluation data, and a second evaluation score indicating accuracy of a second predicted value generated by the prediction model based on the MR data.
According to the present disclosure, it is possible to provide a model evaluating device, a model evaluation method, and a program capable of more appropriately evaluating the robustness of a prediction model.
The disclosure will be described with reference to the accompanying drawings, wherein like numbers reference like elements.
Embodiments will be described hereinafter with reference to the appended drawings. It is intended, however, that unless particularly specified, dimensions, materials, shapes, relative positions and the like of components described in the embodiments shall be interpreted as illustrative only and not intended to limit the scope of the disclosure.
For instance, an expression of relative or absolute arrangement such as “in a direction”, “along a direction”, “parallel”, “orthogonal”, “centered”, “concentric” and “coaxial” shall not be construed as indicating only the arrangement in a strict literal sense, but also includes a state where the arrangement is relatively displaced by a tolerance, or by an angle or a distance within a range in which it is possible to achieve the same function.
For instance, an expression of an equal state such as “same”, “equal”, “uniform” and the like shall not be construed as indicating only the state in which the feature is strictly equal, but also includes a state in which there is a tolerance or a difference within a range where it is possible to achieve the same function.
Further, for instance, an expression of a shape such as a rectangular shape, a cylindrical shape or the like shall not be construed as only the geometrically strict shape, but also includes a shape with unevenness, chamfered corners or the like within the range in which the same effect can be achieved.
On the other hand, an expression such as “comprise”, “include”, “have”, “contain” and “constitute” are not intended to be exclusive of other constituent elements.
Hereinafter, a configuration of a prediction system 1 including a model evaluating device 100 according to an embodiment will be described.
As illustrated in
Note that the network NW is, for example, a wide area network (WAN) or a local area network (LAN). Gateway devices such as modems and routers are not illustrated.
In the prediction system 1 (1A), the prediction device 200 is disposed at a place (local location) provided with a facility such as a plant, and the server device 400 (400A) is disposed at a monitoring site (remote location). Prediction results of the prediction device 200 are transmitted to the server device 400 (400A). An operator may check the prediction results of the prediction device 200 via the server device 400 (400A), and transmit various instruction signals to the prediction device 200 via the server device 400 (400A) and the network NW. With the prediction system 1 (1A), prediction and evaluation of performance of a prediction model can be performed at a local location.
In the prediction system 1 (1B), the transmission device 500 is disposed at a place (local location) provided with a facility such as a plant, and the server device 400 (400B) is disposed at a monitoring site (remote location). The server device 400 (400B) is configured to predict target variables in a case where the measured values received from the transmission device 500 are used as explanatory variables. An operator may check the prediction results output from the server device 400 (400B). With the prediction system 1 (1B), prediction and evaluation of performance of a prediction model can be performed at a remote location.
Note that the configuration of the prediction system 1 is not limited to an edge type as illustrated in
The model evaluating device 100 may be constituted by a plurality of devices instead of one device. That is, the model evaluating device 100 may be implemented through cooperation between a plurality of devices by dispersing functions in the plurality of devices. The model evaluating device 100 may be a device independent of the prediction device 200 and the server device 400.
The prediction model may have a configuration in which a relationship between explanatory variables and target variables in the same time zone is modeled. For example, as in the embodiment illustrated in
However, the prediction model is not limited to such a configuration. The prediction model may have a configuration in which a relationship between explanatory variables and target variables in different time zones is modeled. For example, the prediction model may have a configuration in which a relationship between an explanatory variable in a certain time zone and a target variable in a time zone in the future rather than the time zone of the explanatory variable is modeled. In this case, it is suitable for predicting future target variables, creating an operation plan for future facility, and the like. For example, it is possible to cause a prediction model to predict future weather and power generating capacity based on a measured value of the current outdoor temperature. As described above, the explanatory variables and the target variables may be data in the same time zone, or data in different time zones.
Hereinafter, a configuration of the model evaluating device 100 according to the embodiment will be described.
As illustrated in
The communication unit 11 is a communication interface including a network interface card controller (NIC) for performing wired communication or wireless communication. The communication unit 11 communicates with other devices via the network NW such as a WAN, a LAN, or the like.
The storage unit 12 includes, for example, a random access memory (RAM), a read only memory (ROM), and the like. The storage unit 12 stores programs, various types of data, and the like for performing various control processes. For example, the storage unit 12 stores information such as a prediction model to be applied to an actual machine, a program for performing an optimization process, a prediction model in a state of being relearned using MR data, a prediction model in a state of not being relearned using MR data, a prediction result, an arithmetic equation of evaluation indexes, an evaluation result, evaluation data, MR data, and the like.
The input unit 13 is constituted by an input device such as an operation button, a keyboard, a pointing device, and a microphone, for example. The input unit 13 is an input interface used by a user (for example, an operator in a local or remote location) to input an instruction.
The display unit 14 is constituted by a display device such as a liquid crystal display (LCD) and an electroluminescence (EL) display, for example. The display unit 14 displays various types of information (e.g., a prediction result and an evaluation result).
The control unit 15 is constituted by a processor such as a central processing unit (CPU) and a graphics processing unit (GPU). The control unit 15 implements various functions to be described later by executing the program stored in the storage unit 12.
Hereinafter, a functional configuration of the control unit 15 will be described. The control unit 15 functions as a prediction execution unit 151, a generating unit 152, an evaluating unit 153, a cluster processing unit 154, and an assigning unit 155.
The prediction execution unit 151 is configured to acquire the predicted value of the target variable for the explanatory variable using the prediction model. For example, in a case where a prediction model is stored in the storage unit 12, the prediction execution unit 151 inputs an explanatory variable into the prediction model to acquire a predicted value of a target variable. For example, in a case where the prediction model is stored in another device, the prediction execution unit 151 transmits, via the communication unit 11, an explanatory variable to the device, and receives, from the device, a predicted value of a target variable.
The generating unit 152 is configured to generate expanded MR data by transforming evaluation data. The evaluation data is time series data indicating temporal changes in the explanatory variable and the target variable. The evaluation data is the actually obtained data. The MR data is data for expanding variations in the evaluation data. Hereinafter, specific examples of the evaluation data will be described. Note that while the MT using the image data is known as a technology in the related art, there is no known technique for generating MT using time series data or MR data of time series data.
The “l” symbol added to the variables A, B, C, and Y indicates training data, the symbol “v” added to the variables A, B, C, and Y indicates verification data, and the symbol “a” added to the variables A, B, C, and Y indicates actual data. For example, training data items Al, Bl, Cl, and Yl are data used at the time of initial learning of the prediction model. Verification data items Av, By, Cv, and Yv are data acquired at the time of verification for verifying the performance of the prediction model before actual operation has started. The training data items Al, Bl, Cl, and Yl and the verification data items Av, By, Cv, and Yv are preferably data acquired in different time zones. Actual data items Aa, Ba, Ca, and Ya are data acquired after actual operation has started. The actual data items Aa, Ba, Ca, and Ya may be training data used when updating the prediction model.
The MR data is virtual data obtained by processing the evaluation data. The processing may be partial processing (for example, processing only some section of time series data). Specific examples of the MR data will be described later.
The evaluating unit 153 is configured to evaluate the performance of the prediction model. Specifically, the evaluating unit 153 evaluates the performance of the prediction model based on a first evaluation score indicating accuracy of a first predicted value generated by the prediction model based on the evaluation data and a second evaluation score indicating accuracy of a second predicted value generated by the prediction model based on the MR data generated by the generating unit 152.
Here, evaluation scores indicating the accuracy of these predicted values will be described. A first evaluation score is a difference between a true value (a known target variable actually obtained) and the first predicted value. A second evaluation score is a difference between the target variable generated by the generating unit 152 and the second predicted value. The details of the evaluation score will be described later.
The cluster processing unit 154 is configured to generate a plurality of clusters by clustering the evaluation data. The evaluating unit 153 uses the plurality of clusters generated by the cluster processing unit 154 as evaluation data to evaluate the performance of the prediction model. That is, the evaluating unit 153 evaluates the performance of the prediction model for each cluster by time-dividing the evaluation data and using the clusters classified based on whether these data items are similar to each other.
Note that, clustering may be performed in units of one explanatory variable (for example, whether they are similar by focusing only on A), or clustering may be performed in units of a plurality of explanatory variables (for example, whether A, B, and C are all similar). Clustering may be performed by focusing on the target variable Y.
Note that in
The assigning unit 155 is configured to assign a weighting to the second evaluation score indicating the accuracy of the second predicted value, which is the predicted value generated by the prediction model, based on the MR data generated by the generating unit 152. The assigning unit 155 may assign the weighting in accordance with at least one of the type of transformation processing, the amount of transformation, and target data of the MR data. The evaluating unit 153 may evaluate the performance of the prediction model based on the second evaluation score to which the weighting is assigned.
When the evaluation data is time series data, the target data for the transformation processing of the MR data may be the cluster c to be transformed (for example, cluster 3), or a time t to be transformed (a time window tw illustrated in
Further, the weighting may be performed in accordance with the results of analysis of frequency of occurrences, or may be performed in accordance with a condition set by the user based on knowledge (a determination in consideration of likelihood or validity). The frequency of occurrences may be, for example, the number of pieces of data classified into the same cluster c. In items such as the type of MR transformation processing, the amount of transformation, or target data, a relatively significant change pattern such as a change pattern that is likely to occur, a change pattern with a large effect on performance, or the like, may be assigned with a greater weight than a non-significant change pattern. By adjusting the second evaluation score by such a weighting, it is possible to improve prediction accuracy for a desired change pattern.
Hereinafter, specific examples of the MR data generated by the generating unit 152 will be described. The generating unit 152 may generate the MR data by adding at least one of offset processing, slope change processing of the temporal change, time axis inversion processing, time constant change processing of the temporal change, filtering processing, noise addition processing, and transformation processing using generative adversarial networks (GAN) to the waveform indicating the temporal change of the evaluation data.
Hereinafter, each type of transformation processing of the MR data will be described. Note that, in the following description, one explanatory variable A and one target variable Y will be described as representative examples for the sake of simplifying the description. However, the explanatory variable and the target variable are not limited to such examples. The number of explanatory variables and target variables may be one, or plural (for example, explanatory variables A, B, C, and D and target variables X and Y).
First, slope change processing of the temporal change will be described.
As illustrated in
As illustrated in
Next, offset processing will be described.
As illustrated in
Next, time constant change processing of the temporal change will be described.
As illustrated in
Next, time axis inversion processing will be described.
As illustrated in
Note that in the examples illustrated in
Next, filtering processing, noise addition processing, and transformation processing using a GAN will be described. For example, when white noise is added and when white noise is removed, these may be MR data because the average values are the same. Thus, such MR may be used to generate MR data. Furthermore, in generating the MR data, a technique for generating new data by transformation processing using a GAN generation network may be applied. According to the evaluation using such MR data, it is also possible to evaluate the robustness with respect to the magnitude of the slope, unlike the evaluation of the robustness with respect to the slope by the slope change processing of the temporal change.
The above-described MR data can be combined as appropriate. Robustness can be more appropriately evaluated by combining the above-described MR data and using it for optimizing the prediction model.
MR data may be generated by transforming into some clusters. That is, MR data may be generated by transforming into cluster units. Hereinafter, examples thereof will be described.
As illustrated in
Also, as illustrated in
The MR data may be subject to processing of time series data of some or all times. Here, the time concept in processing may be set as a subordinate concept in the cluster, or may be set as a separate concept from the cluster without clustering. For example, also in the weighting, the weights may be changed between the first half three minutes and the second half three minutes of the cluster, or the weights may be changed between five minutes with time series data and times other than the five minutes.
Hereinafter, a time window tw when the time is the processing target will be described.
First, as illustrated in
Furthermore, the evaluating unit 153 may add an offset to the explanatory variable Av in the time window tw. The evaluating unit 153 may adjust the waveform portion other than the time window tw by scaling with the apex fixed so as to connect to the explanatory variable Av after adding the offset. Thus, an explanatory variable Av* of the MR data illustrated in
Specific examples of the MR data have been described above. Note that the storage unit 12 of the model evaluating device 100 may be configured to store information regarding MR data. The information regarding the MR data may be MR data, or may be additional information indicating the type of transformation processing, the amount of transformation, target data, and the like in generating the MR data.
The information regarding the MR data stored in the storage unit 12 may be updated in accordance with the learning state of the prediction model. In this case, the storage unit 12 can store more appropriate MR data. Note that the storage unit 12 may further store information regarding evaluation data, a weighting assigned once, an arithmetic equation used for evaluation to be described later, various predicted values, various evaluation scores to be described later, evaluation results to be described later, results of necessity determination to be described later, and the like. According to such a configuration, for example, information regarding MR such as the data used to generate past MR data and the generated MR data can be read from the storage unit 12 and reused.
Hereinafter, performance evaluation of the prediction model by the model evaluating device 100 according to some embodiments will be described. This performance evaluation is performed when considering the effectiveness of relearning using MR data, the need for updating the prediction model or updating the weighting, and the like, and when applying the update.
The evaluating unit 153 is configured to evaluate the performance of the prediction model based on a first evaluation score indicating the accuracy of a first predicted value and a second evaluation score indicating the accuracy of a second predicted value. Furthermore, the evaluating unit 153 acquires, as a third evaluation score, the first evaluation score when the prediction model after relearning based on the MR data is evaluated using the evaluation data (actual data) acquired after actual operation has started. The evaluating unit 153 acquires, as a fourth evaluation score, the first evaluation score when the prediction model before relearning based on the MR data is evaluated using the evaluation data (actual data) acquired after actual operation has started. The evaluating unit 153 acquires, as a fifth evaluation score, the second evaluation score when the prediction model after relearning based on the MR data is evaluated using the MR data based on the evaluation data (actual data) acquired after actual operation has started. The evaluating unit 153 acquires, as a sixth evaluation score, the second evaluation score when the prediction model before relearning based on the MR data is evaluated using the MR data based on evaluation data (actual data) acquired after actual operation has started. Note that updating the prediction model is performed by relearning the prediction model based on the evaluation data (actual data) acquired after actual operation has started and/or the MR data.
The evaluating unit 153 may be configured to determine the necessity of at least one of updating the prediction model and updating the weighting assigned to the second evaluation score in accordance with the evaluation results based on the third evaluation score, the fourth evaluation score, the fifth evaluation score, and the sixth evaluation score. The result of the necessity determination may be presented to a user as reference information. In this case, the user may manually update the prediction model and the weighting. Furthermore, the update process of the prediction model or the update process of the weighting used by the model evaluating device 100 may be automatically executed based on the result of the necessity determination instead of the manual operation of the user.
The evaluating unit 153 may be configured to execute at least one of a process of applying the update to the prediction model and a process of updating the weighting assigned by the assigning unit 155 in accordance with the result of the necessity determination. In this case, the update of the prediction model and the update of the weighting are automatically executed in accordance with the result of the necessity determination, and thus the burden on the user can be reduced.
The evaluating unit 153 may be configured to calculate an evaluation index based on the square of the first evaluation score and a sum of the squares of the second evaluation score, to which the weighting is assigned, indicating the accuracy of the second predicted value and optimize the prediction model based on the calculated evaluation index. The evaluation index is J calculated using Equation (1) below, for example. Note that Equation (1) is an equation for calculating an evaluation index for the entire time series data.
J=(y{circumflex over ( )}−y)2+Σw(c, m, s){(y{circumflex over ( )}MR(c, m, s)−yMR(c, m, s))2+(y{circumflex over ( )}MR−(c, m, s)−yMR−(c, m, s))2} (1)
In Equation (1), y{circumflex over ( )} indicates a first predicted value, and y indicates a true value, that is, a target variable of the evaluation data. y{circumflex over ( )}MR indicates a second predicted value and yMR indicates a MR true value. The MR true value is a value obtained by adding a difference due to the transformation of MR to the true value, that is, y, which is the evaluation data. The reference sign MR indicates the processed portion of the MR data, and the reference sign MR- indicates the unprocessed portion of the MR data. (c, m, s) means that the elements of the weights w, y{circumflex over ( )}MR, and y{circumflex over ( )}MR- are denoted by the cluster c, m indicating the type or magnitude of MR, and s indicating the time window tw. That is, there are a weight w, a second predicted value y{circumflex over ( )}MR, and y{circumflex over ( )}MR- for each combination of elements. In Equation (1), y{circumflex over ( )}−y corresponds to a first evaluation score, and y{circumflex over ( )}MR(c, m, s)−yMR(c, m, s) and y{circumflex over ( )}MR−(c, m, s)−yMR−(c, m, s) correspond to a second evaluation score.
Σ indicates that the sum is calculated for all combinations of elements. For example, when a cluster c1 is processed, the unprocessed portions of the MR data are other clusters c2, c3 . . . cz (where z is any numerical value).
For example, the evaluating unit 153 may optimize the prediction model by causing the prediction model to learn so that the performance of the prediction model is increased in accordance with the evaluation index. It can be said that, if the evaluation index is J, the smaller the evaluation index, the higher the performance of the prediction model. On the other hand, if the evaluation index is the reciprocal of J, the smaller the evaluation index, the lower the performance of the prediction model. In other words, the relationship between the performance of the prediction model and the evaluation index depends on the definition. Thus, as long as the learning is performed such that the performance of the prediction model is increased, a configuration in which the evaluation index is increased may be used, or a configuration in which the evaluation index is decreased may be used.
The evaluating unit 153 may be configured to calculate an evaluation index for each of combinations of the type of transformation processing, the amount of transformation, and target data of the MR data, and extract one or more combinations evaluated as having a low performance of the prediction model. The one or more combinations may be a predetermined number of combinations (for example, N selected from the lowest order of evaluation (where N is a natural number set by the user)) evaluated as having a low performance of the prediction model. Further, the one or more combinations may be a combination selected depending on whether the evaluation index is equal to or less than the reference value.
The extraction of the upper N combinations by the evaluating unit 153 is performed, for example, by decomposing each combination of c, m, and s from the results obtained by calculating Equation (1) described above and extracting the N combinations from them. Note that the evaluating unit 153 may be configured so that, instead of calculating the total value as in Equation (1) described above, a subscript of combination is added to J like Jcms, and N or more equations obtained by modifying the combination of cms are created, and the upper N elements are extracted from the equations.
The evaluating unit 153 may acquire evaluation data corresponding to the target data of the extracted one or more combinations, input the acquired evaluation data into the generating unit 152, and cause the generating unit 152 to generate MR data by transformation processing corresponding to the type of transformation processing and the amount of transformation of the one or more combinations. Also, the evaluating unit 153 may be configured to cause the prediction model to perform relearning using the generated MR data. In this case, the prediction model performs relearning on data having a low performance, and thus the prediction accuracy is improved.
Here, a specific example of the results and necessity determination of the performance evaluation of the prediction model by the evaluating unit 153 will be described. The necessity determination is a determination as to whether updating the prediction model or updating the weighting is necessary using actual data acquired after actual operation has started. In the evaluation in the determination, the actual data acquired after actual operation has started and the MR data are used to evaluate the current prediction model (both the presence or absence of relearning based on the MR data).
Here, the third evaluation score may be a value obtained by calculating the evaluation index J=(y{circumflex over ( )}−y)2 in the prediction model after relearning based on the MR data. The fourth evaluation score may be a value obtained by calculating the evaluation index J=(y{circumflex over ( )}−y)2 in the prediction model before relearning based on the MR data. The fifth evaluation score may be a value obtained by calculating the evaluation index J=Σw(c, m, s){(y{circumflex over ( )}MR(c, m, s)−yMR(c, m, s))2+(y{circumflex over ( )}MR−(c, m, s)−yMR−(c, m, s))2} in the prediction model after relearning based on the MR data. The sixth evaluation score may be a value obtained by calculating the evaluation index J=Σw(c, m, s){(y{circumflex over ( )}MR(c, m, s)−yMR(c, m, s))2+(y{circumflex over ( )}MR−(c, m, s)−yMR−(c, m, s))2} in the prediction model before relearning based on the MR data.
First, in Case 1, all of the evaluation scores are good. In this case, because performance is good regardless of the presence or absence of relearning based on the MR data, it may be determined that updating the prediction model is unnecessary. In Case 2, because all performance is poor, it is thought that actual data completely different from the learned data or the MR data is input. In this case, updating the prediction model may be determined to be necessary, or may be considered an outlier and determined to be unnecessary. In Case 3, only the third evaluation score is good, and the other evaluation scores are poor. In this case, it can be seen that relearning using the MR data was effective. It may also be considered to reflect the results in weight updates.
In Case 4, it can be seen that the prediction accuracy has dropped on the MR data generated from the actual data, and therefore the need for updating the prediction model may be considered. In Case 5, for the actual data, the results are poor regardless of the presence or absence of relearning based on the MR data, so that it is conceivable that unlearned actual data is input. Thus, it may be determined that the update of the prediction model is necessary. Because Case 6 has good results of the third evaluation score and the fifth evaluation score, it may be determined that the update of the prediction model is unnecessary. In this case, the effectiveness of relearning based on MR data can also be checked. In Case 7, the results of the third evaluation score and the fifth evaluation score are poor, so that it can be seen that relearning using MR data is not successful. Thus, it may be determined that the update of the weight is necessary.
The flow of the model evaluation method will be described below with reference to
As illustrated in
The evaluating unit 153 extracts a combination with low prediction accuracy from evaluation results based on the evaluation index (step S12). The evaluating unit 153 calculates the evaluation index J for the entire time series data, and extracts one or more combinations of the type of transformation processing, the amount of transformation, and target data (target cluster and target time) of the MR data with low prediction accuracy. Note that the evaluating unit 153 may extract the MR data with low prediction accuracy by focusing on only one or more of the type of transformation processing, the amount of transformation, and the target data.
Here, it is determined whether to perform the relearning (step S13). For example, the model evaluating device 100 may determine whether to perform relearning in response to an input instruction by an operator. In this case, the model evaluating device 100 may display evaluation results of MR data with low prediction accuracy so that the operator can determine whether to perform the relearning. Furthermore, the model evaluating device 100 may compare the evaluation results with the threshold value to determine whether to perform the relearning. Note that this step S13 may be omitted, and the relearning may necessarily be performed, or the relearning may be performed only by evaluating the performance. When the relearning is not performed (step S13; No), the model evaluating device 100 ends the process.
When the relearning is performed (step S13; Yes), the model evaluating device 100 causes the prediction model to perform relearning on the MR data of the combination with low prediction accuracy (step S14). Specifically, the model evaluating device 100 acquires evaluation data corresponding to the target data of the extracted one or more combinations, inputs the acquired evaluation data into the generating unit 152, and causes the generating unit 152 to generate MR data by transformation processing corresponding to the type of transformation processing and the amount of transformation of the one or more combinations. In addition, the evaluating unit 153 causes the prediction model to perform relearning using the generated MR data. Note that the calculation of the evaluation index and the execution of relearning may be performed repeatedly until sufficient robustness can be ensured.
First, the evaluating unit 153 performs performance evaluation of the prediction model by using the actual data and the MR data, and acquires various evaluation scores (step S21). The various evaluation scores are the third evaluation score, the fourth evaluation score, the fifth evaluation score, and the sixth evaluation score. As a result, a result of any of the Case 1 to Case 7 illustrated in
The evaluating unit 153 determines the necessity of updating the prediction model and updating the weighting based on the various evaluation scores (step S22). The evaluating unit 153 determines whether updating the prediction model or the weighting is necessary as a result of the necessity determination (step S23). When the update is determined to be unnecessary (step S23; No), the process ends. On the other hand, if the update is determined to be necessary (step S23; Yes), the evaluating unit 153 updates the prediction model or updates the weighting assigned by the assigning unit 155 (step S24).
The present disclosure is not limited to the embodiments described above and also includes a modification of the above-described embodiments and a combination of a plurality of embodiments as appropriate.
The details described in each embodiment can be understood as follows, for example.
(1) According to the present disclosure, there is provided a model evaluating device (100) that evaluates performance of a prediction model configured to generate predicted values of a target variable for an explanatory variable. The model evaluating device (100) includes: a generating unit (152) configured to generate expanded MR data by transforming evaluation data; and an evaluating unit (153) configured to evaluate the performance of the prediction model based on a first evaluation score indicating accuracy of a first predicted value generated by the prediction model based on the evaluation data, and a second evaluation score indicating accuracy of a second predicted value generated by the prediction model based on the MR data.
According to the above configuration, even in a case where only using the actually obtained evaluation data when evaluating the performance of the prediction model is insufficient, the performance of the prediction model is evaluated using the MR data, so that the robustness of the prediction model can be evaluated more appropriately.
(2) In some embodiments, in the configuration described in (1) above, the evaluation data is time series data indicating temporal changes in the explanatory variable and the target variable.
For example, in facilities such as plants, power generation devices, and the like, it may be necessary to control the operation of various devices or to create an operation plan for the facility. In this case, it is conceivable to use time series data of measured values of various sensors in operation control and the creation of an operation plan. In this regard, according to the above configuration, the performance of the prediction model is evaluated using the time series data as evaluation data. Thus, this configuration is suitable for evaluating the performance of a prediction model when the prediction result is used for operation control and the creation of an operation plan.
(3) In some embodiments, in the configuration described in (2) above, the generating unit (152) generates the MR data by adding at least one of offset processing, slope change processing of the temporal change, time axis inversion processing, time constant change processing of the temporal change, filtering processing, noise addition processing, and transformation processing using a GAN to a waveform indicating temporal change of the evaluation data.
According to the above configuration, since MR data obtained by adding at least one type of transformation processing that is often performed as a transformation example of time series data is used for performance evaluation of the prediction model, it is possible to evaluate the robustness of the prediction model that predicts a target variable for an explanatory variable that changes in time.
(4) In some embodiments, in the configuration described in any one of (1) to (3) above, the model evaluating device further includes a cluster processing unit (154) configured to generate a plurality of clusters by clustering the evaluation data, in which the evaluating unit (153) evaluates the performance of the prediction model by using the plurality of clusters as the evaluation data.
According to the above configuration, it is possible to classify similar evaluation data by clustering and evaluate the performance of the prediction model for each cluster.
(5) In some embodiments, in the configuration described in any one of (1) to (4) above, the model evaluating device further includes an assigning unit (155) configured to assign a weighting to the second evaluation score indicating accuracy of the second predicted value in accordance with at least one of a type of transformation processing, an amount of transformation, and target data of the MR data, in which the evaluating unit (153) evaluates the performance of the prediction model based on the second evaluation score to which the weighting is assigned.
According to the above configuration, it is possible to evaluate prediction accuracy for a desired change pattern by adjusting by weighting.
(6) In some embodiments, in the configuration described in any one of (1) to (5) above, the evaluating unit (153) is configured to: acquire, as a third evaluation score, the first evaluation score when the prediction model after learning based on the MR data is evaluated using the evaluation data acquired after actual operation has started; acquire, as a fourth evaluation score, the first evaluation score when the prediction model before learning based on the MR data is evaluated using the evaluation data acquired after the actual operation has started; acquire, as a fifth evaluation score, the second evaluation score when the prediction model after learning based on the MR data is evaluated using the evaluation data acquired after the actual operation has started; and acquire, as a sixth evaluation score, the second evaluation score when the prediction model before learning based on the MR data is evaluated using the evaluation data acquired after the actual operation has started.
According to the above configuration, it is possible to determine whether it is better to update the prediction model or the weighting.
(7) In some embodiments, in the configuration described in (6) above, the evaluating unit (153) determines a necessity of at least one of updating the prediction model and updating a weighting assigned to the second evaluation score in accordance with evaluation results based on the third evaluation score, the fourth evaluation score, the fifth evaluation score, and the sixth evaluation score.
According to the above configuration, it is possible to determine the necessity of at least one of whether an improvement in the performance of the prediction model can be expected by updating the prediction model and whether an improvement in the learning capacity of the prediction model can be expected by updating the weighting.
(8) In some embodiments, in the configuration described in (7) above, the evaluating unit (153) executes at least one of processing for updating the prediction model and processing for updating the weighting assigned by the assigning unit (155) in accordance with a result of the necessity determination.
According to the above configuration, the update of the prediction model and the update of the weighting are automatically executed in accordance with the result of the necessity determination, and thus the burden on the user can be reduced.
(9) In some embodiments, in the configuration described in any one of (1) to (8) above, the model evaluating device (100) further includes a storage unit (12) configured to store information regarding the MR data.
According to the above configuration, for example, information regarding MR such as the data used to generate past MR data and the generated MR data can be read from the storage unit (12) and reused.
(10) In some embodiments, in the configuration described in any one of (1) to (9) above, the evaluating unit (153) calculates an evaluation index based on a square of the first evaluation score indicating accuracy of the first predicted value and a sum of squares of the second evaluation score, to which a weighting is assigned, indicating accuracy of the second predicted value and evaluates the performance of the prediction model based on the evaluation index.
According to the above configuration, the balance between the first evaluation score and the second evaluation score in the evaluation index used in the performance evaluation can be adjusted by weighting. Thus, even when virtual MR data is used in evaluating the performance of the prediction model, a more realistic evaluation can be performed.
(11) In some embodiments, in the configuration described in (10) above, the evaluating unit (153) causes the prediction model to learn so that the performance of the prediction model is increased in accordance with the evaluation index.
According to the above configuration, the prediction model can be learned using the advantages of the evaluation index described above.
(12) In some embodiments, in the configuration described in (11) above, the evaluating unit (153) calculates the evaluation index for each of combinations of a type of transformation processing, an amount of transformation, and target data of the MR data, and extracts one or more combinations evaluated as having a low performance of the prediction model.
According to the above configuration, it is possible to evaluate what kind of data the prediction model has a low performance. This extraction result can also be utilized for relearning.
(13) In some embodiments, in the configuration described in (12) above, the evaluating unit (153) is configured to acquire the evaluation data corresponding to the target data of the extracted one or more combinations, input the acquired evaluation data to the generating unit (152), cause the generating unit (152) to generate the MR data by transformation processing corresponding to the type of transformation processing and the amount of transformation of the one or more combinations, and cause the prediction model to perform relearning using the generated MR data.
According to the above configuration, since the prediction model is relearned for data having a low performance, the prediction accuracy (robustness) of the prediction model is improved.
(14) According to the present disclosure, there is provided a model evaluation method for evaluating performance of a prediction model configured to generate predicted values of a target variable for an explanatory variable. The model evaluation method includes: generating expanded MR data by transforming evaluation data; and evaluating the performance of the prediction model based on a first evaluation score indicating accuracy of a first predicted value generated by the prediction model based on the evaluation data, and a second evaluation score indicating accuracy of a second predicted value generated by the prediction model based on the MR data.
According to the above method, even in a case where only using the actually obtained evaluation data when evaluating the performance of the prediction model is insufficient, the performance of the prediction model is evaluated using the MR data, so that the robustness of the prediction model can be evaluated more appropriately.
(15) According to the present disclosure, there is provided a program for causing a computer to evaluate performance of a prediction model configured to generate predicted values of a target variable for an explanatory variable. The program causes the computer to execute: generating expanded MR data by transforming evaluation data; and evaluating the performance of the prediction model based on a first evaluation score indicating accuracy of a first predicted value generated by the prediction model based on the evaluation data and a second evaluation score indicating accuracy of a second predicted value generated by the prediction model based on the MR data.
According to the above program, even in a case where only using the actually obtained evaluation data when evaluating the performance of the prediction model is insufficient, the performance of the prediction model is evaluated using the MR data, so that the robustness of the prediction model can be evaluated more appropriately.
While preferred embodiments of the disclosure have been described as above, it is to be understood that variations and modifications will be apparent to those skilled in the art without departing from the scope and spirit of the disclosure. The scope of the disclosure, therefore, is to be determined solely by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2020-079478 | Apr 2020 | JP | national |