The present invention relates to a model generation system, a model generation method, and a model generation program for generating a prediction model. The present invention also relates to a prediction system that predicts a future state based on past data.
For example, let us consider predicting a secular change of an inspection value of an employee or the like measured in a health checkup or the like, a disease onset probability of a lifestyle-related disease based on it, or the like, and giving advice to each employee regarding health. Specifically, let us consider a case where future state (secular change of an inspection value, a disease onset probability, etc.) when the current lifestyle habits continue for three years is predicted based on past health checkup results and data showing lifestyle habits at that time, and then, an industrial physician, an insurer, etc. propose (health-instruct) review of the lifestyle habits, etc., or the employee himself/herself self-checks it.
In that case, the following method can be considered as a method for obtaining the transition of the predicted value. First, learn a prediction model that obtains a predicted value one year ahead from past data. For example, learn a prediction model that in association with past actual values (inspection values) of a prediction target, uses training data indicating further past inspection values that can be correlated with the past actual values, the attributes (age, etc.) of the prediction target person, and lifestyle habits at that time, and then, uses a prediction target item after 1 year as a target variable and other items that can be correlated with it as explanatory variables. Then, with respect to the obtained prediction model, the process of inputting the explanatory variables and obtaining a predicted value one year ahead while changing a time point (prediction time point) at which the value to be predicted is obtained is repeated for several years. At this time, by keeping the items related to lifestyle habits among the explanatory variables constant, it is possible to obtain the transition of the predicted value when the current lifestyle habits are continued for three years.
Related to the prediction of diagnosis of lifestyle-related diseases, for example, there are prediction systems described in PTLs 1 and 2. The prediction system described in PTL 1 predicts the disease onset probability of a lifestyle-related disease using a plurality of neural networks that have learned the presence or absence of onset according to the same learning pattern consisting of six items of age, body mass index (BMI), diastolic blood pressure (DBP), HDL cholesterol, LDL cholesterol, and insulin resistance index (HOMA-IR).
Further, it is described that the prediction system disclosed in PTL 2, when predicting the medical expenses, the medical practice, and the inspection values of the next year from the medical expenses, the medical practice, the inspection values, and lifestyle habits of this year, creates a model by limiting the direction of the correlation between each data (the direction of edges in the graph structure). Specifically, as shown in FIG. 38B and FIG. 38C of PTL 2, lifestyle habits affect the inspection values, the inspection values affect the medical practice, the medical practice affects the medical expenses, and these states in the past will affect these states in the future. Further, PTL 2 describes that a model is created for each age.
PTL 1: Japanese Patent Application Laid-Open No. 2012-64087
PTL 2: Japanese Patent Application Laid-Open No. 2014-225175
The problem is that when progress prediction is performed using a prediction model that has been learned by using all the explanatory variables that can be correlated to the prediction target in order to improve the prediction accuracy without any particular restriction, there are cases in which the transition of the predicted value in the progress prediction exhibits changes that are different from the common findings. For example, let us consider giving advice based on the transition of the predicted value obtained by predicting the disease onset probability of a lifestyle-related disease and the inspection values related to it when the explanatory variables related to lifestyle habits are constant.
According to the general feeling, if the lifestyle habits are kept constant, for example, as shown in
However, if the progress prediction is performed simply by repeatedly applying the prediction model that predicts the predicted value at the next time point (for example, one year later) in a predetermined prediction time unit, although the lifestyle habits are kept constant, as shown in
In addition,
For example, like the above health simulation, when it is considered that an industrial physician, an insurer, etc. propose (health-instruct) review of the lifestyle habits based on the results of predicting the progress based on past health checkup results and data showing lifestyle habits at that time, or the employee himself/herself self-checks it, it is important for a prediction mechanism to reduce the number of samples in which the above-mentioned invalid transition of the predicted value is output. However, the prediction systems described in PTLs 1 and 2 do not consider the validity of the transition of the predicted value in the progress prediction.
For example, PTL 1 describes that by utilizing the variability of prediction results obtained from a plurality of neural networks having different constituent elements, it is possible to accurately obtain the disease onset probability of a lifestyle-related disease in a certain year in the future (specifically after six years). However, when the predicted value is obtained by such a prediction method, the transition of the predicted value does not always approach a certain value.
In addition, for example, PTL 2 describes that when predicting the medical expenses, the medical practice, and the inspection values of the next year from the medical expenses, the medical practice, the inspection values, and lifestyle habits of this year, it is possible to obtain a model that is intuitively easy to understand by limiting the direction of correlation between each data (the direction of edges in the graph structure). Further, PTL 2 discloses that a model is created for each age. However, when the transition of the predicted value is obtained by the prediction method described in PTL 2, there is a possibility that the transition of the predicted value may not approach a certain value.
Furthermore, the above problem is not limited to the case of predicting an inspection value that lifestyle habits influence and the disease onset probability of a lifestyle-related disease based on the inspection value, but similarly occurs in the case of predicting items having similar properties. In other words, for a certain item, when the transition of a value of the item when values of some items except the actual values (for example, items whose values can be controlled by a person) among the other items related to the item are set to constant values is considered, due to the characteristics of the item, the same problem occurs if the item has a valid transition type (pattern) such that the value of the item gradually approaches (converges) to a certain value, diverges, the direction of change is constant, gradually diverges while changing the direction of change, gradually converges while changing the direction of change. In addition, “valid” here means probable at least in the knowledge of the person who handles the predicted value.
Therefore, it is an object of the present invention to provide a model generation system, a prediction system, a model generation method, and a model generation program that can reduce the number of samples in which an invalid transition of a predicted value is output in progress prediction.
A model generation system according to the present invention includes: regularization parameter candidate setting means that outputs a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model; model learning means that learns, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; accuracy evaluation means that evaluates, using predetermined verification data, a prediction accuracy of each of a plurality of the learned prediction models; transition evaluation means that evaluates, for each of the plurality of the learned prediction models, a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; and model determination means that determines a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples.
Further, the model generation system according to the present invention may include: constrained model evaluation means that evaluates, using predetermined verification data, a prediction accuracy of a constrained model, which is one of prediction models used for progress prediction performed by fixing some values of a plurality of explanatory variables, which is a prediction model that predicts a value of a prediction target item at a prediction time point when a predicted value is obtained, and which is a prediction model in which at least a constraint that a variable other than a main variable indicating a value of the prediction target item at a prediction reference point is not used as a non-control variable that is an explanatory variable whose value changes in the progress prediction is imposed to the explanatory variable; regularization parameter candidate setting means that outputs a search set which is a search set of regularization parameters used for regularization of a calibration model, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a model formula of the calibration model, the calibration model being one of the prediction models used for the progress prediction, the calibration model being a prediction model for predicting a calibration value for calibrating the predicted value obtained in the constrained model for arbitrary prediction target data, and the calibration model being a prediction model including two or more non-control variables and one or more control variables that can be controlled by a person in the explanatory variables; model learning means that learns, using training data, a calibration model corresponding to each of the plurality of solution candidates included in the search set; accuracy evaluation means that evaluates, using predetermined verification data, a prediction accuracy of each of the plurality of learned calibration models; transition evaluation means that evaluates, for each of the plurality of learned calibration models, a graph shape indicated by a transition of the predicted value after calibration obtained as a result of calibrating the predicted value obtained from the constrained model with the calibration value obtained from the calibration model or the number of defective samples, which is the number of samples for which the transition is not valid, using predetermined verification data; and model determination means that determines a calibration model used for the progress prediction from among the plurality of learned calibration models based on an index regarding the prediction accuracy and an index regarding the graph shape or the number of defective samples.
Further, a prediction system according to the present invention includes: regularization parameter candidate setting means that outputs a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model; model learning means that learns, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; accuracy evaluation means that evaluates, using predetermined verification data, a prediction accuracy of each of a plurality of the learned prediction models; transition evaluation means that evaluates, for each of the plurality of the learned prediction models, a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; model determination means that determines a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples; model storage means that stores a prediction model used for the progress prediction; and prediction means that when prediction target data is given, performs the progress prediction using the prediction model stored in the model storage means.
A model generation method according to the present invention includes: outputting a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model; learning, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; evaluating, for each of the plurality of the learned prediction models, each of a prediction accuracy and a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; and determining a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples.
A model generation program according to the present invention causes a computer to execute the processes of: outputting a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model; learning, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; evaluating, for each of the plurality of the learned prediction models, each of a prediction accuracy and a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; and determining a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples.
According to the present invention, it is possible to reduce the number of samples in which a transition of a predicted value that is not valid in progress prediction is output.
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
An exemplary embodiment of the present invention will be described below with reference to drawings. First, terms used in the present invention will be described. Hereinafter, the time point when the progress prediction is started and at least having the actual value is referred to as a “reference point”. Note that the reference point may be the latest time point having the actual value. Further, hereinafter, the period from the reference point to the earliest prediction time point when the change over time is desired may be referred to as an “evaluation target period for progress prediction” or simply “evaluation target period”.
For example, when the predicted value is obtained every year, the prediction time unit may be approximately one year (one year ±α). As the prediction time point itself, an arbitrary time point can be designated regardless of the prediction time unit that is a repeating unit of the prediction model. Specifically, if the value at a certain time point tp is desired to predict, it may be predicted by using an actual value at the time point tp−Δt (the time point that the prediction time unit goes back from the prediction time point) or a predicted value corresponding thereto, and a value indicating the prediction condition at the time point tp (for example, items related to lifestyle habits at the prediction time point). Here, Δt corresponds to the prediction time unit.
Hereinafter, a time point (the above-mentioned tp−Δt) traced back by the prediction time unit with respect to an arbitrary prediction time point (the above-mentioned tp) may be referred to as a prediction reference point. Further, the prediction reference point corresponding to the first prediction time point (that is, the prediction time point closest to the reference point) included in the evaluation target period may be referred to as a first prediction reference point. It should be noted that the first prediction reference point is a starting time point in the iteration of the prediction model. Therefore, it can be said that the progress prediction predicts the value of the prediction target item at each prediction time point included in the evaluation target period based on the information at the first prediction reference point. Furthermore, the prediction model used for the progress prediction can be said to be a model that predicts the value of the prediction target item at the prediction time point that is one ahead of the prediction reference point, based on the information at the prediction reference point. In the following, the number of time points (excluding the reference point) included in the evaluation target period may be expressed as N, and the number of prediction time points may be expressed as n.
Note that, as shown in
Next, the explanatory variable in the present invention will be described. The following formula (1) is an example of a prediction model formula (prediction formula) when a linear model is used as the prediction model. Although a linear model is shown as an example of the prediction model for simplification of description, the prediction model is not particularly limited to a linear model, and for example, a piecewise linear model used for heterogeneous mixture learning, a neural network model, or the like may be used.
y=a
0
+a
1
x
1
+a
2
x
2
+ . . . +a
mxm (1)
In formula (1), y is a target variable and xi is an explanatory variable (where i=1, . . . , m). Note that m is the number of explanatory variables. Further, ai (where i=0, . . . , m) is a parameter of the linear model, a0 is an intercept (constant term), and ai is a coefficient of each explanatory variable.
First, consider a case where the prediction target is one inspection item. At this time, the prediction model used for the progress prediction can be considered as a function that outputs a value y(t) of the prediction target item at a time point t using, as an input, a combination of explanatory variables X(t−1)={x1(t−1), x2(t−1), . . . , xm(t−1)} that can be acquired at a prediction reference point (t−1) that is one time point before the time point t as the prediction time point. The number in parentheses on the right shoulder of y, x represents time on the prediction unit time axis.
Furthermore, in the progress prediction, in order to predict the value y(t+1) of the prediction target item at the time point (t+1), as one of the explanatory variables, a variable indicating the value y(t) of the prediction target item at the prediction reference point (t), or a variable calculated from the value y(t) is used. Here, if the explanatory variable is x1, the relationship between the two can be expressed as x1(t)←y(t). Note that x1(t)←y(t) shows that the value y(t) of the prediction target item at a certain time point t is used for one of the explanatory variables (specifically, x1(t) which is the value of the prediction target item at the time point t) in the prediction model for predicting the value y(t+1) of the prediction target item at the next time point (t+1). Then, the prediction model in the progress prediction can be more simply considered as a function that outputs, using, as an input, the combination X(t) of the explanatory variables at a time point t, a predicted value (y(t+1)) corresponding to one of the explanatory variables, for example x1(t+1), at the next time point (t+1) as the predicted value.
Using such a property, in the progress prediction, a combination X(0) of explanatory variables including the explanatory variable x1(0) corresponding to the value y(0) of the prediction target item at the prediction reference point (t=0) is input to the prediction model to obtain the value y(1) of the prediction target item at the first prediction time point (t=1). Then, the combination X(1) of the explanatory variables at the first prediction time point including x1(1) is generated from the y(1) and input to the prediction model, and a value y(2) of the prediction target item at the second prediction time point (t=2) is obtained. After that, similar processing is repeated until the value at the final prediction time point is obtained (see
Note that, in the above description, the example in which the progress prediction is performed for one inspection item using the prediction model has been shown, but the number of prediction target items is not limited to one. That is, it is considered that the predicted values of a plurality of inspection items and their transitions are obtained using a prediction formula learned for each inspection item. These prediction formulas can be simply expressed as follows. Although a linear model is illustrated as the prediction model in this example as well, the prediction model is not limited to the linear model.
Note that u in formula (2) is the number of inspection items, and u<m. In each prediction formula, xk(t) (where k=1 to u) is an explanatory variable corresponding to the inspection item 1 to u. This shows that not only the past value of the inspection item to be predicted by itself, but also the past values of other inspection items can be used as explanatory variables, and whether they are actually used or not is adjusted by the coefficients of the variables.
In the following, xi(t+1)=yi(t+1) (where i=1 to u, u<m). Then, the above formula (2) can be written as the following formula (3). In addition, a formula (4) is the matrix notation of formula (3).
In the formula (4), the prediction formulas for a plurality of inspection items are collectively shown as one formula using matrix, but actually, as shown in the formula (2), the prediction formula is retained for each inspection item and without omission of explanatory variables.
It is also possible to simplify the above formula (4) and write it as the formula (5). Note that X(t) and X(t+1) are column vectors, and explanatory variables xk(t) (k=u+1 to m) corresponding to items (inquiry items, etc.) other than inspection items among explanatory variables shall contain the specified values.
x(t+1)=AX(t) (5)
At this time, the predicted value at the n-th prediction time point (t=n) obtained based on the data at the reference point (t=0) is expressed as follows. In addition, An represents the n-th power of A.
In the above description, the prediction formula for predicting the predicted value X(n) at the time point +n is defined as formula (6) using the data X(0) at the reference point. Also in the progress prediction, the predicted value at a desired time point may be acquired at a pinpoint, for example, by using a prediction formula such as the formula (6) for directly obtaining the predicted value at the +n time point.
In the above description, xi(t) and yi(t) do not have to be exactly the same value (the same applies to x1(t) and y(t) when there is one inspection item). For example, xi(t)=yi(t)+α, xi(t)=βyi(t), xi(t)=(yi(t))γ, xi(t)=sin(yi(t)) and the like (where α, β, and γ are all coefficients). That is, xi(t) as an explanatory variable may be a variable indicating the value y-hd i(t) of the prediction target item at that time point (prediction reference point). Hereinafter, the variable corresponding to the above x1(t) may be referred to as “main variable”. More specifically, the main variable is a variable indicating the value of the prediction target item at the prediction reference point.
The explanatory variables may include variables other than the main variable. Hereinafter, an explanatory variable whose value is made not to change in the progress prediction, that is, an explanatory variable that can be controlled by a person, like the variable indicating the item related to the above-mentioned lifestyle habits, is referred to as a “control variable”. In the following, among explanatory variables, explanatory variables other than “control variables”, that is, explanatory variables whose values change in progress prediction may be referred to as “non-control variables”. Examples of non-control variables include the above main variable, a variable that indicates the value of the prediction target item at an arbitrary time point before the prediction reference point, a variable that indicates an arbitrary statistic (difference, etc.) calculated from that variable and other variables, a variable represented by one-variable function or other-variable function based on those variables, and arbitrary other variables (variables indicating values of items other than the prediction target item at the prediction time point or an arbitrary time point before that), etc.
If the explanatory variables of the prediction model include non-control variables (hereinafter referred to as sub-variables) other than the main variable, when the prediction is repeated, in addition to the above xi(t)←yi(t), the value of each sub-variable may also be xk(t)←(yi(t), . . . ). Here, xk(t) represents an arbitrary sub-variable, and yi(t) represents a variable constituting the sub-variable and an arbitrary variable that can be acquired at the prediction reference point. As a result of model learning, there may be a case where the main variable is not used (coefficient is zero), but only the sub-variable and the control variable are used.
By the way, in general, learning of a prediction model is performed with emphasis on prediction accuracy. In such a prediction accuracy-emphasized learning method (hereinafter referred to as the first method), for example, an optimum solution of the model is obtained by searching for a solution of a model parameter that minimizes a predefined error function. However, since the method does not consider the graph shape in the progress prediction as described above, there is a possibility that many samples that cannot be interpreted when used in the progress prediction are output.
As one of the solutions to the above problem, if the explanatory variable used in the prediction model is limited, and if some kind of constraint is added to the coefficient of the explanatory variable, it is possible to forcibly fit the transition of the predicted value in the progress prediction to a predetermined graph shape (hereinafter referred to as a second method). However, in the second method that emphasizes fitting to a desired graph shape, it is necessary to limit the number of explanatory variables whose values are changed during the progress prediction to one, and there is a problem that the prediction accuracy decreases.
For example, when the latest value of the prediction target item (latest inspection value) is used as the explanatory variable, other inspection values (the inspection value at the two previous time points or the inspection value of other than the prediction target item) or the statistic values using the other inspection values cannot be used as the explanatory variables of the prediction model. Therefore, depending on the nature of the prediction target item and how to select other explanatory variables, it is not possible to properly express the relationship between the target variable (predicted value) and the explanatory variables in the prediction model, and there is a risk that the prediction accuracy may decrease.
In the following exemplary embodiments, a method for reducing the number of samples in which an invalid transition of a predicted value is output in progress prediction will be described without limiting the explanatory variables.
The model generation unit 10 is a processing unit that generates a prediction model used for progress prediction when training data is input and stores the prediction model in the model storage unit 14, and includes a regularization parameter candidate setting unit 11, a model learning unit 12, and a model selection unit 13.
The regularization parameter candidate setting unit 11, when the training data is input, outputs, as a search set of regularization parameters that are parameters used for regularization of the prediction model, a search set in which a plurality of solution candidates are set, the solution candidates having different values of a regularization parameter that affects a term of at least one or more variables (hereinafter, referred to as strong regularization variables) specifically defined among the explanatory variables.
The regularization is generally performed in order to prevent overlearning and increase generalization capability during model learning, and prevents the value of a model coefficient (coefficient of explanatory variable of the prediction formula) from becoming a large value by adding a penalty term to the error function. Here, the strong regularization variable is a variable that imparts stronger regularization than other explanatory variables. Further, in the following, among the explanatory variables, variables other than “strong regularization variable” may be referred to as “weak regularization variable”.
In the present exemplary embodiment, the explanatory variables are classified as follows according to their uses and properties.
(1) Control variable or non-control variable (including main variable and sub variable)
(2) Strong regularization variable or weak regularization variable
The strong regularization variable may be, for example, one or more variables specifically defined among variables (sub variables and control variables) other than the main variable. Further, the strong regularization variable may be, for example, one or more variables specifically defined among variables (that is, sub variables) other than the main variable among the two or more non-control variables included therein. In the following, an example in which among the explanatory variables, a variable indicating the value of the prediction target item at an arbitrary time point before the prediction reference point (one time point before in a predetermined prediction unit time from the prediction time point as the time point when the predicted value is obtained) or a variable indicating an arbitrary statistic calculated from that variable and other variables is used as a strong regularization variable is shown, but the strong regularization variable is not limited to these.
For example, when the explanatory variables used for the prediction model used in the progress prediction for a certain inspection item within the next three years include (a) the inspection value of last year (non-control variable (main variable)), (b) the difference value between the inspection value of the year before last and the inspection value of last year (non-control variable (sub-variable)), and (c) one or more variables (control variables) corresponding to one or more items related to lifestyle habits, the difference value of (b) may be used as the “strong regularization variable”. The strong regularization variable is not limited to one.
The model learning unit 12 uses the training data to respectively learn the prediction models corresponding to the solution candidates included in the search set of regularization parameters output from the regularization parameter candidate setting unit 11. That is, the model learning unit 12 learns, for each solution candidate of the regularization parameter output from the regularization parameter candidate setting unit 11, a prediction model corresponding to the solution candidate by using the training data.
The model selection unit 13 selects a prediction model that has a high prediction accuracy and has a small number of samples (hereinafter referred to as the number of defective samples) that cannot be interpreted from among a plurality of prediction models (learned models having mutually different values of the regularization parameters that affect the term of the strong regularization variable) learned by the model learning unit 12. The model selection unit 13 may use, for example, predetermined verification data (for example, a data set including a combination of explanatory variables whose target values are known) to calculate the prediction accuracy and the number of defective samples of each learned prediction model and select a predetermined number of prediction models to be used for progress prediction based on the calculated prediction accuracy of each prediction model and the number of defective samples. The number of models to be selected may be one or more. The model selection method by the model selection unit 13 will be described later. The verification data may be the training data used for learning or a part thereof (data divided for verification).
The model storage unit 14 stores the prediction model selected by the model selection unit 13.
When the prediction target data is input, the model application unit 15 uses the prediction model stored in the model storage unit 14 to perform progress prediction. Specifically, the model application unit 15 may apply the prediction target data to the prediction model stored in the model storage unit 14 to calculate the predicted value (value of the prediction target item) at each prediction time point included in the evaluation target period.
Next, the operation of the prediction system of the present exemplary embodiment will be described. The operation of the prediction system of the present exemplary embodiment is roughly divided into a model learning phase in which a prediction model is learned using training data and a prediction phase in which progress prediction is performed using the learned prediction model.
First, the model learning phase will be described. The model learning phase is performed at least once before the prediction phase.
In the example shown in
In step S101, the conditions of the regularization parameter and the like may be accepted together with the training data.
Next, the regularization parameter candidate setting unit 11 outputs a search set in which a plurality of solution candidates having different values of the regularization parameter that affects at least the term of the strong regularization variable among the explanatory variables of the prediction model are set (step S102).
Next, the model learning unit 12 uses the training data to respectively learn the prediction models corresponding to the solution candidates of the regularization parameter output from the regularization parameter candidate setting unit 11 (step S103).
Next, the model selection unit 13 selects a prediction model having a high prediction accuracy and a small number of defective samples from among the plurality of learned prediction models, and stores it in the model storage unit 14 (step S104).
Next, the prediction phase will be described.
In the example shown in
When the prediction target data is input, the model application unit 15 reads the prediction model stored in the model storage unit 14 (step S202), applies the prediction target data to the read prediction model, and obtains a predicted value at each prediction time point included in the evaluation target period (step S203).
Next, a method of determining a search set of regularization parameters by the regularization parameter candidate setting unit 11 (a method of setting a plurality of solution candidates) will be described. In the following, a regularization parameter used for a linear model will be described as an example, but the regularization parameter is not limited thereto. Now, assume that a formula (7) is given as the prediction formula of the prediction model.
[Math 2]
y(x, w)=w0+w1x1+w2x2+ . . . +wMxM (7)
For the above prediction formula, if the target value is tn (where n=1 to N), the error function for the above prediction formula is expressed as in a following formula (8a) or formula (8b), for example. Here, a target value tk corresponds to correct solution data (actual value, etc.) in the combination of the explanatory variables {x1, . . . , xM} given by K pieces of training data. In the formulas (8a) and (8b), the second term on the right side corresponds to the penalty term.
Here, the regularization parameter corresponds to parameters λ and λj (where j=1 to M) used in the penalty term of the error function. Note that “∥q” in the formula represents a norm, and for example, q=1 (L1 norm) is used in the Lasso method and q=2 (L2 norm) is used in the Ridge regression method. The above description is an example of the regularization parameter used for the linear model, but the regularization parameter is not limited to this.
Usually, the regularization parameter is determined by searching for a solution of λ={λ1, . . . λM} (hereinafter referred to as the minimum solution) that minimizes the output J(w) of the error function including the penalty term. At this time, λ may be set to one value for all the coefficients, or may be set to a different value for each coefficient. Note that if zero is set for all the coefficients, it means that regularization is not performed. In either case, one solution is determined for λ.
The regularization parameter candidate setting unit 11 of the present exemplary embodiment performs the following regularization parameter candidate setting process in place of or in addition to the normal regularization process. That is, the regularization parameter candidate setting unit 11 sets a plurality of solution candidates λ′, λ″, λ′″, . . . having different values for the regularization parameter that affects at least the term of the strong regularization variable (in this example, the difference variable xdiff) among the terms included in the prediction formula of the prediction model, and outputs them as a search set of the regularization parameter α. Note that, each solution candidate may be one regularization parameter λ that is commonly set for all the terms included in the prediction formula, or may be a set of a regularization parameter λj that is set for each of the terms included in the prediction formula.
For example, assume that the strong regularization variable is x2 in the prediction formula (7). In that case, when the error function is expressed by, for example, the formula (8a), the regularization parameter candidate setting unit 11 sets a plurality of solution candidates λ(1), λ(2), . . . having different values at least for the regularization parameter λ2 corresponding to the term of the x2. The number on the right shoulder is an identifier of the solution candidate. At this time, the regularization parameter λj′ (other than j=2) corresponding to the other terms may be an arbitrary value such as the minimum solution value or zero. It should be noted that the arbitrary value includes a value specified by the user, a value of a solution other than the minimum solution obtained as a result of adjustment by a processing unit other than the regularization parameter candidate setting unit 11 of the present exemplary embodiment by another method, or the like. Further, for example, when the error function is expressed by the formula (8b), the regularization parameter candidate setting unit 11 sets a plurality of solution candidates λ(1), λ(2), . . . having different values at least for the regularization parameter λ that affects the term of the x2.
Incidentally, the regularization parameter candidate setting unit 11, when determining the search set of regularization parameters, in each solution candidate, not only sets a fixed value for other regularization parameters (e.g., regularization parameters corresponding to terms other than the term of the strong regularization variable), but also can set different values for other regularization parameters. The regularization parameter candidate setting unit 11 may set the regularization parameter that affects the term of the strong regularization variable and the other regularization parameters at the same time, or can set these separately. In any case, it is assumed that the solution candidates differ in at least the value of the regularization parameter that affects the term of the strong regularization variable.
When the strong regularization variable includes a statistic represented by a multivariable function, the terms of the strong regularization variable are decomposed into variables used in the multivariable function, and then the regularization parameters are defined for the prediction formula after decomposition. For example, as shown in following formula (9), when the prediction model includes a variable xy1 (main variable) indicating a value of inspection item 1 at the prediction reference point, a variable xdiff (sub-variable and strong regularization variable) that indicates the difference between the value of inspection item 1 at the prediction reference point and the value of inspection item 1 at the time point immediately before the prediction reference point, and an arbitrary control variable, the prediction formula is rewritten as shown in following formula (10). In the formula (10), xy2 is a variable indicating the value of inspection item 1 at the time point immediately before the prediction reference point. That is, xdiff=xy1−xy2. The terms for strong regularization in the formula (10) are the xdiff term (second term) and the xy2 term (third term). Therefore, the regularization parameter candidate setting unit 11 may generate a plurality of solution candidates having respectively different values set for at least the regularization parameter λ2 corresponding to the xdiff term (second term) and the regularization parameter λ3 corresponding to the xy2 term (third term).
Therefore,
y
1=(a−c)xy1+(b+c)xdiff+cxy2+ . . . +d (10)
Note that, the method of setting a plurality of solution candidates in the regularization parameter candidate setting unit 11 is not particularly limited. The regularization parameter candidate setting unit 11 can set solution candidates by the following method, for example.
(1) External condition setting
(2) Grid search
(3) Random setting
(1) External Condition Setting
The regularization parameter candidate setting unit 11 may set the search set (a plurality of solution candidates) explicitly given from the outside as it is.
(2) Grid Search
Further, the regularization parameter candidate setting unit 11, for example, when the search range (for example, 0≤λ≤500) of the regularization parameter and the grid width (for example, 100) are given from the outside as a condition of the regularization parameter, may use grid points ({0,100,200,300,400,500}) obtained by dividing the given search range into a grid as solution candidates for the normalization parameter corresponding to one strong regularization target term.
(3) Random Setting
Further, the regularization parameter candidate setting unit 11, for example, when a distribution of the regularization parameter and the number of generations are given from the outside as conditions of the regularization parameter, may sample points for the number of generations specified from the given distribution, and use the obtained sampling value as a solution candidate of the normalization parameter corresponding to one strong regularization target term. The solution candidates are not limited to these, and may be those that the regularization parameter candidate setting unit 11 determined by calculation or based on external input.
The following is an example of a search set of regularization parameters in which a plurality of such solution candidates are set. Note that the following example is an example in which the strong regularization variable is x2. In this example, v2-1 to v2-k corresponds to the value set in the regularization parameter λ2 corresponding to the term of the strong regularization variable x2.
If there are a plurality of strong regularization variables, a plurality of λ solutions (solution candidates) having at least different values expressed by the combination of regularization parameters λreg_1 to λreg_p (p is the number of strong regularization variables) corresponding to each term of the strong regularization variable may be output. An example of such a plurality of λ solution candidates is shown below. Note that the following example is an example in which the strong regularization variables are x2 and x3. v2-x (x is arbitrary) corresponds to the value of the regularization parameter λ2 corresponding to the term of the strong regularization variable x2. v3-x (x is arbitrary) corresponds to the value of the regularization parameter λ3 corresponding to the term of the strong regularization variable x3.
Next, the model selection method by the model selection unit 13 will be described in more detail. The model selection unit 13, for example, as a result of performing prediction using predetermined verification data, may select a model having the smallest number of defective samples from among the models with the prediction accuracy (e.g., Root Mean Squared Error (RMSE) or correlation coefficient) equal to or more than a predetermined threshold. Examples of the threshold for the prediction accuracy include XX% or less of the prediction accuracy of the model with the highest prediction accuracy, a threshold (correlation coefficient 0.6, etc.) set by the user or domain expert, and the like.
In addition, the model selection unit 13, for example, as a result of performing prediction using predetermined verification data, may select a model having the highest prediction accuracy from among the models with the number of defective samples equal to or less than a predetermined threshold. Examples of the threshold for the number of defective samples include a threshold (1 digit etc.) set by the user or domain expert (an expert or the like in a technical field normally handling a predicted value, such as machine learning field and medical field), and XX% or less for the number of defective samples of the model having the smallest number of defective samples.
Further, the model selection unit 13 may obtain an accuracy score showing a positive correlation with respect to the obtained prediction accuracy and obtain a shape score showing a negative correlation with respect to the number of defective samples, and select a predetermined number of models in descending order of total score, which is represented by the sum of the two. In this way, it is possible to easily select a desired model by adjusting a positive weight used for calculating the accuracy score from the prediction accuracy and a negative weight used for calculating the shape score from the number of defective samples.
In addition, as a method for determining whether or not the sample is a defective sample, the following methods can be given.
(1) Determine based on an error from an asymptotic line or a desired shape line in a predetermined confirmation period.
(2) Determine based on a difference in a slope of a vector (hereinafter referred to as transition vector α) that connects the predicted values between the two time points in the predetermined confirmation period.
(3) Determine based on an angle (hereinafter referred to as transition angle β) formed by two straight lines represented by two adjacent transition vectors in a predetermined confirmation period.
(4) Determine based on an amount of change in the predicted value in the predetermined confirmation period.
Here, the confirmation period is a period set for confirmation of a graph shape, and means a period in which at least two values including the actual value and the predicted value are given, and at least one prediction time point is included from the time point when the actual value is given. Note that the same number of time points and number of prediction time points as those in the evaluation target period may be used in the confirmation period.
For example, in the above method (1), when a predicted value at each time point included in the confirmation period is obtained for a combination of explanatory variables, an amount of error between the predicted value and a curve (approximate curve) obtained by curve fitting to the transition of the predicted value, that is, the sum of differences between the value at each time point on the approximate curve and each predicted value is obtained as an asymptotic score. Then, if the asymptotic score is equal to or more than the predetermined threshold, the combination may be determined to be a defective sample.
Further, in the above method of (1), for example, the model selection unit 13, when having obtained a predicted value at each time point included in the confirmation period for a combination of explanatory variables, may calculate an invalidity score, which is an index indicating an invalidity of the transition of the predicted value (more specifically, it is series data including data that indicates the obtained predicted value, and is series data including three or more pieces of data that indicate the value of the prediction target item in association with time) and an invalidity score based on the error between the curve model obtained by fitting the series data to a predetermined function form and the series data, and determine the combination to be a defective sample, if the calculated invalidity score is equal to or smaller than a predetermined threshold. In addition, the invalidity score is not only used to determine whether the sample is a defective sample or not, but can be used as it is as an evaluation index regarding a graph shape by defining the index showing a negative correlation with the calculated invalidity score as a shape score. The details of the calculation method of the invalidity score will be described later.
Further,
wscore=|d1−d2+|d2−d3|+ . . . +|dn−1−dn| (11)
The wscore of this example shows a small value when the amount of change in the slope is small, and a large value when the amount of change in the slope is large. Therefore, the model selection unit 13 may determine that the shape is defective (a sample that cannot be interpreted) if the wscore is equal to or larger than a predetermined threshold.
Further, for example, in the above method (3), the transition angles β1 to β3 may be obtained from the transition vectors α1 to α4 as shown in
vscore=(1/π)(π−β1)+ . . . +(1/π)(π−βn−1) (12)
The vscore in this example shows a larger value as the transition angle becomes smaller. Therefore, the model selection unit 13 may determine that the shape is defective (the sample that cannot be interpreted) if the vscore is equal to or larger than a predetermined threshold, for example.
Further, for example, in the above method (4), when two predicted values (y1, y2) are given, the ascore is obtained as follows, and if the obtained ascore is equal to or more than a threshold, the shape may be determined to be defective.
ascore=|y1−y2|−a (when |y1−y2|≥a)
ascore=0 (other times)
Note that when three or more predicted values are given, the sum of absolute values of differences between adjacent time points may be used instead of |y1−y2|.
The model selection unit 13may use the wscore, vscore, or ascore instead of or in addition to the number of defective samples when evaluating the model.
Further,
As shown in
As described above, according to the present exemplary embodiment, it is possible to generate a model with a good balance between the prediction accuracy and the number of defective samples without limiting the explanatory variables used in the prediction model. Therefore, for example, even when the progress prediction of a certain inspection value is performed in a health simulation or the like by keeping the items related to lifestyle habits constant, it is possible to reduce the possibility of presenting an uninterpretable prediction result while maintaining the prediction accuracy.
In the above description, an example is shown in which model learning to model selection is performed by a single search for regularization parameters (setting of solution candidates), but it is also possible to further include a higher-order module that controls the three modules of the regularization parameter candidate setting unit 11, the model learning unit 12, and the model selection unit 13, so that a better range can be searched sequentially.
In the example shown in
The model evaluation unit 131 evaluates at least a prediction accuracy for each of a plurality of prediction models (learned models having mutually different values of the regularization parameter corresponding to the strong regularization variable) learned by the model learning unit 12, the prediction model corresponding to each solution candidate included in the search set in which the regularization parameter is determined by the regularization parameter candidate setting unit 11. The model evaluation unit 131 may apply, for example, predetermined verification data to each of the plurality of learned prediction models, and evaluate the prediction accuracy based on the difference between the obtained predicted value and the target value. Note that the model evaluation unit 131 may further obtain the number of defective samples, wscore, vscore, shape score, etc. when having performed the progress prediction as an index related to the graph shape (the graph shape indicating the transition of the predicted value at the time of the progress prediction), and evaluate the graph shape based on the obtained index. The evaluation result of the model evaluation unit 131 is stored in the model storage unit 14.
It should be noted that such a model evaluation unit 131 can also be implemented by utilizing a part of the function (function of evaluating a model) of the model selection unit 13 of the first exemplary embodiment. In that case, the function (model evaluation function) may store, in the model storage unit 14, the evaluation index (prediction accuracy, accuracy score, number of defective samples, shape score, total score, wscore, vscore, etc.) of each model obtained based on the verification data, together with the model information.
When the prediction target data is input, the model application unit 15 of the present example reads out a plurality of learned prediction models stored in the model storage unit 14, applies the prediction target data to each read prediction model, and obtains the predicted value at each prediction time point included in the evaluation target period for each model.
The model determination unit 132, based on the evaluation result of each model stored in the model storage unit 14 and the predicted value at each prediction time point included in the evaluation target period for each model obtained by the model application unit 15, obtains the final predicted value by selecting (determining) one model that obtains the final predicted value from among the plurality of models that have obtained the predicted values. For example, the model determination unit 132, after evaluating the graph shape indicated by the prediction result (predicted value at each prediction time point) obtained from the current prediction target data, may determine a model based on the evaluation result (current evaluation result) and the evaluation result (past evaluation result) of each model stored in the model storage unit 14.
Similarly to the above, the evaluation of the graph shape can be performed by determining whether or not the sample is a defective sample, and obtaining the wscore, vscore, shape score, and the like.
Further, as the model determination method in the model determination unit 132, the model selection method by the model selection unit 13 described above can be used. At this time, the model determination unit 132 uses the evaluation result stored in the model storage unit 14 for the evaluation index regarding the prediction accuracy such as the prediction accuracy and the accuracy score. On the other hand, the model determination unit 132, for the number of defective samples, and the evaluation index regarding the graph shape such as the wscore, vscore, and shape score, may use only the current evaluation result, or the evaluation result obtained by putting together the current evaluation result and the past evaluation result (the evaluation result obtained by adding the current evaluation result to the evaluation result stored in the model storage unit 14).
For example, when only the current evaluation result is used as the evaluation index regarding the graph shape, a model most suitable for the current sample, that is, for the current prediction target data (for example, a model that outputs the most accurate predicted value while excluding the predicted value indicating an invalid change) can be selected. Further, for example, when the evaluation result obtained by putting together the current evaluation result and the past evaluation result is used as an evaluation index regarding the graph shape, a model most suitable for many samples not limited to the current sample (for example, a model with a good balance between the prediction accuracy and the number of defective samples) can be selected.
When the model determination unit 132 determines one model, the predicted value at each prediction time point included in the evaluation target period obtained from the model is output as the final predicted value.
Note that other points may be similar to those of the prediction system shown in
Further, in the above description, an example in which the prediction system 100 includes the model generation unit 10 and the model generation unit 10A has been shown, but, for example, it is also possible to store the prediction model as described above in the model storage unit 14 in advance. In that case, the model generation unit 10 and the model generation unit 10A can be omitted.
Next, a second exemplary embodiment of the present invention will be described.
The prediction system 200 shown in
The constrained model generation unit 21 is a processing unit that generates a prediction model using the second method, that is, a method emphasizing fitting to a desired graph shape, and includes a constraint imparting unit 211, a model learning unit 212, and a model evaluation unit 213. Hereinafter, a prediction model generated by the constrained model generation unit 21 will be referred to as a constrained prediction model.
The constrained prediction model is not particularly limited as long as it is a prediction model that does not use two or more non-control variables such as a main variable and sub-variables for explanatory variables of the prediction model. The constrained prediction model may be a linear model or a piecewise linear model on which in addition to the constraint that two or more non-control variables are not used, for example, a constraint that in the prediction formula, a coefficient of an explanatory variable corresponding to one non-control variable (for example, a main variable) falls within a range of value designated in advance is imposed. Such a constrained prediction model can be generated by limiting the explanatory variables used in the prediction formula to one non-control variable (for example, main variable) and one or more control variables, and then performing model learning by imposing the constraint so that the coefficient of the non-control variable falls within a range of value designated in advance.
In addition, when the prediction model is a piecewise linear model including a section definition, a constraint that only the control variable is used for the section definition may be further imposed. This makes it possible to prevent the prediction formulas from switching in the piecewise linear model when the control variable is constant.
By adding such a constraint, it is possible to prevent a predicted value indicating an invalid transition from being output. In addition, the range of value of the coefficient of the non-control variable may be defined according to a predetermined type as a valid transition type in the value of the prediction target item.
Due to the above constraints, the prediction formula of the prediction model can be expressed as, for example, formula (13). Here, xmain
represents the main variable, a represents the coefficient of the main variable, xq represents the column vector of the control variable, θT represents the row vector of the coefficient of the control variable, and b represents the intercept. Although the main variable is used as one non-control variable in the formula (13), one non-control variable may be other than the main variable.
Then, by solving a recurrence formula of the prediction formula (13) with t=0, a general formula indicating the value of the prediction item at the n-th prediction time point can be derived as formula (14).
After limiting the prediction formula used in the prediction model to the prediction formula expressed by such a recurrence formula, by limiting the range of value of the coefficient (a described above) of the non-control variable to a predetermined range, it is possible to prevent the prediction formula from outputting a predicted value that shows an invalid change under the condition that the control variable is constant. As an example, under such a constraint, for example, if the range of value of the coefficient a of the non-control variable is limited to 0<a<1, then the graph shape can be limited to an upward asymptotic shape as shown in
The range of value of the coefficient a of the non-control variable is not limited to the above, and may be any value that matches a predetermined valid transition type of the predicted value.
The constraint imparting unit 211 imparts the above-described constraint to the model learning unit 212 in the subsequent stage.
The model learning unit 212 learns a model using given first training data under the constraint given by the constraint imparting unit 211. Here, the first training data (training data 1 in the figure) given to the model learning unit 212 may be a data set in which the value of a prediction target item which is the target variable in the constrained prediction model, and a plurality of variables (more specifically, a combination of the plurality of variables) that can be correlated with the value of the prediction target item and are explanatory variables in the constrained prediction model are associated with each other. For example, the first training data may be data obtained by removing the sub-variables from among the plurality of variables that are the explanatory variables in the training data of the first exemplary embodiment. The constrained model generation unit 21, for example, similarly to the first exemplary embodiment, after inputting the training data including the main variable, the sub-variable and the control variable to the explanatory variables, may generate training data to be given to the model learning unit 212 by the constraint imparting unit 211 performing data processing (for example, such as setting the value to zero, and deleting from the data structure itself) of excluding the explanatory variable corresponding to the sub-variable from among the input training data.
The model evaluation unit 213 evaluates the prediction accuracy of the constrained model learned by the model learning unit 212. The model evaluation unit 213 may, for example, apply predetermined verification data to each of the learned constrained prediction models, and evaluate the prediction accuracy based on the difference between the obtained predicted value and the target value. At this time, the model evaluation unit 213 calculates a value (for example, residual, log residual, residual sum of squares, etc.) used for calibration of the predicted value obtained from the prediction model (learned constrained prediction model) in the prediction phase. The value used for calibration (hereinafter referred to as a calibration value) is not limited to the above example as long as it can be calculated from the target value (value of the target variable) indicated by the given first training data, and a predicted value obtained by applying the training data to the learned constrained prediction model. In the following, the calibration value calculated based on the target value indicated by the training data and the predicted value actually acquired from the combination of the explanatory variables associated with the target value may be referred to as an actual calibration value.
The learned constrained model is stored in the model storage unit 24. At this time, for the purpose of learning the calibration model described later, information that associates the actual calibration value obtained from the training data with the combination of the explanatory variables having obtained the actual calibration value may also be stored together.
Further, the calibration model generation unit 22, using the method shown in the first exemplary embodiment, generates a plurality of learned prediction models corresponding to each of a plurality of solution candidates having at least different values for the regularization parameter that affects the term of the predetermined strong regularization variable. However, the calibration model generation unit 22 of the present exemplary embodiment generates, using the method shown in the first exemplary embodiment, the prediction model for predicting the calibration value for the predicted value output by the conditional prediction model for arbitrary prediction target data. Hereinafter, the prediction model generated by the calibration model generation unit 22 will be referred to as a calibration model.
That is, the calibration model may be any prediction model having a variable indicating a calibration value with respect to the predicted value output from the conditional prediction model for arbitrary prediction target data as a target variable. The explanatory variable is not particularly limited, but may be a variable indicating a value that can be acquired at the prediction reference point and may be a variable that can be correlated with the calibration value. Note that the explanatory variables of the calibration model may be a plurality of variables that can be correlated with the value of the prediction target item at the prediction time point one ahead of the prediction reference point, as in the first exemplary embodiment. In other words, it may be one in which one or more non-control variables are further added to the plurality of variables that are the explanatory variables in the constrained model.
The regularization parameter candidate setting unit 221, the model learning unit 222, and the model selection unit 223 in the calibration model generation unit 22 may be basically the same as the regularization parameter candidate setting unit 11, the model learning unit 12, and the model selection unit 13 of the first exemplary embodiment except that the prediction model to be generated is the calibration model, that is, the target variables are different.
That is, the regularization parameter candidate setting unit 221, when the second training data (training data 2 in the figure) is input, outputs a search set in which a plurality of solution candidates are set, the solution candidates having at least different values of the regularization parameter that affects the term of the strong regularization variable among the regularization parameters that are parameters used for regularization of the calibration model.
The model learning unit 222 uses the second training data to learn the calibration model corresponding to each solution candidate included in the search set of the regularization parameter output from the regularization parameter candidate setting unit 221.
The model selection unit 223 selects a prediction model having high prediction accuracy and a small number of defective samples from among the calibration models respectively corresponding to the plurality of solution candidates of the regularization parameter learned by the model learning unit 222, and stores it in the model storage unit 24.
However, in the calibration model generation unit 22 (regularization parameter candidate setting unit 221, model learning unit 222 and model selection unit 223) of the present exemplary embodiment, the predicted value and the target value of the prediction model are the predicted value (calibration value) and the target value (actual calibration value) of the calibration model. As the target value (actual calibration value) of the calibration model, for example, one included in the input second training data may be used. That is, if the second training data is a data set in which the combination of the explanatory variables and the actual calibration value in the combination of the explanatory variables are associated with each other, the actual calibration value included in the second training data may be used. Incidentally, if the second training data is a data set in which the combination of the explanatory variables and the value (the target value of the constrained model) of the actual prediction item in the combination of the explanatory variables are associated with each other, the actual calibration value may be calculated from the predicted value obtained by applying the combination (however, non-control variables other than one non-control variable are excluded) to the constrained model and the target value.
When the prediction target data is input, the model application unit 25 uses the constrained model and the calibration model stored in the model storage unit 24 to perform progress prediction. Specifically, the model application unit 25 may apply the prediction target data to the constrained model and the calibration model stored in the model storage unit 14 to obtain the value of the prediction target item at each prediction time point included in the evaluation target period. The prediction target data input to the model application unit 25 may be a combination of a plurality of arbitrary variables including at least the explanatory variable of the constrained model and the explanatory variable of the calibration model.
When the prediction target data is input, the constrained model application unit 251 uses the constrained model stored in the model storage unit 24 to perform progress prediction of the value of the prediction target item. Specifically, the constrained model application unit 251 may read the constrained model stored in the model storage unit 24, apply prediction target data (however, a non-control variable other than one non-control variable (for example, (sub-variable) is excluded) to the constrained model, and obtain a predicted value (value of the prediction target item. Hereinafter, called a first predicted value) at each prediction time point included in the evaluation target period. At this time, the constrained model application unit 251 performs a process of excluding the non-control variables other than one non-control variable from the prediction target data as required.
When the prediction target data is input, the calibration model application unit 252 uses the calibration model stored in the model storage unit 24 to acquire the calibration value of the first predicted value at each prediction time point obtained from the constrained model. The calibration model application unit 252, for example, may read the calibration model stored in the model storage unit 24, apply the prediction target data to the calibration model, and obtain the calibration value of the first predicted value at each prediction time point included in the evaluation target period.
The calibration unit 253 calibrates the first predicted value at each prediction time point included in the evaluation target period obtained by the constrained model application unit 251, based on the calibration value for the value obtained by the calibration model application unit 252. Then, the first predicted value at each prediction time point after calibration is output as the final predicted value, that is, the value of the prediction target item at each prediction time point.
Next, the operation of the prediction system of the present exemplary embodiment will be described. The operation of the prediction system of the present exemplary embodiment is also roughly divided into a model learning phase for learning a prediction model (constrained model and calibration model) using training data, and a prediction phase for performing progress prediction using the learned prediction model.
First, the model learning phase will be described. The model learning phase is performed at least once before the prediction phase.
In the example shown in
In step S301, information on constraints given to the constrained model may be accepted together with the training data.
Then, the constrained model generation unit 21, based on the input first training data, generates a constrained model on which the above constraint is imposed (step S302), and saves it in the model storage unit 24 (step S303).
When the constrained model is stored in the model storage unit 24, the calibration model generation unit 22 evaluates the constrained model, calculates an actual calibration value that is a target value of the calibration model to generate the second training data, and then inputs the second training data (step S304). In this example, as the second training data, a data set indicating a combination of variables including the calibration value (a target value of the calibration model) of the value of the prediction target item at a certain time point t′, a main variable and one or more sub-variables when the time point (t′−1) immediately before the time point t′ is the prediction reference point, and an arbitrary control variable is input.
In step S304, the regularization parameter conditions and the like may be accepted together with the second training data.
Next, the regularization parameter candidate setting unit 221 outputs a search set in which a plurality of solution candidates are set, the solution candidates having different values of the regularization parameter that affects at least the term of the strong regularization variable among the explanatory variables of the calibration model. (step S305).
Next, the model learning unit 222 uses the second training data to respectively learn the calibration model corresponding to each solution candidate of the regularization parameter output from the regularization parameter candidate setting unit 221 (step S306).
Next, the model selection unit 223 selects a calibration model having high prediction accuracy and a small number of defective samples from among the plurality of learned calibration models, and stores it in the model storage unit 24 (step S307).
Next, the prediction phase will be described.
In the example shown in
When the prediction target data is input, the model application unit 25 uses the constrained model and the calibration model stored in the model storage unit 24 to perform progress prediction (step S411 to step S412, step S421 to steps S422, and step S423).
When the prediction target data is input, for example, the constrained model application unit 251 of the model application unit 25 reads the constrained model from the model storage unit 24 (step S411), applies the prediction target data (however, the sub-variables are excluded) to the read constrained model, and obtains a predicted value (first predicted value) at each prediction time point included in the evaluation target period (step S412).
Further, when the prediction target data is input, for example, the calibration model application unit 252 of the model application unit 25 reads the calibration model from the model storage unit 24 (step S421), applies the prediction target data to the read calibration model, and obtains a predicted value (calibration value of the first predicted value) at each prediction time point included in the evaluation target period (step S422).
Then, the calibration unit 253 of the model application unit 25 calibrates the first predicted value at each prediction time point using the calibration value to obtain a final predicted value at each prediction time point (step S423).
As described above, according to the present exemplary embodiment, since it is possible to calibrate the predicted value obtained by the constrained model limited so as to output the predicted value having the predetermined graph shape, with the calibration value by the calibration model in consideration of both the prediction accuracy and the shape accuracy, it is possible to obtain the same effect as that of the first exemplary embodiment. That is, it is possible to reduce the possibility of presenting an uninterpretable prediction result while maintaining the prediction accuracy.
Note that
Further, although not shown in the drawings, also in the present exemplary embodiment, a part of the function (model selection processing) by the model selection unit 223 may be performed in the prediction phase. In that case, a model determination unit may be further provided after the model application unit 25. Note that the model determination method by the model determination unit may be basically the same as that of the model determination unit 132 of the first exemplary embodiment.
Also in the present exemplary embodiment, it is also possible to previously store the constrained model and the calibration model as described above in the model storage unit 24. In that case, the constrained model generation unit 21 and the calibration model generation unit 22 can be omitted.
In each of the above-described exemplary embodiments, an example in which a model is selected based on the evaluation result regarding the prediction accuracy and the evaluation result regarding the graph shape or the number of defective samples is shown. However, a timing to perform these evaluations is not limited to the time for model selection. For example, after the model selection, it is also possible to further perform these evaluations in the shipping determination or the like of the prediction result of the progress prediction performed using the model.
In that case, for example, a shipping determination unit (not shown) may input transition of the predicted value obtained from the model to be shipped or transition of the predicted value to be shipped (more specifically, series data including data indicating the obtained predicted value, the series data including two or more pieces of data indicating the value of the prediction target item in association with time, perform evaluation regarding the prediction accuracy as described above or evaluation regarding the graph shape or the number of defective samples on the input series data, and make a shipping determination of the model that has obtained the predicted value included in the series data or the predicted value included in the series data, based on those evaluation results.
The shipping determination may be made, for example, by determining whether or not the total score represented by the sum of the accuracy score indicating a positive correlation with the obtained prediction accuracy and the shape score indicating a negative correlation with the number of defective samples is equal to or more than a predetermined threshold. In shipping determination, only the evaluation regarding the graph shape or the number of defective samples may be performed, and the availability of shipping may be determined based on the obtained evaluation result.
Further, the shipping determination unit, for example, when the predicted values for a plurality of samples are obtained from one prediction model, if all the series data including the predicted value in each sample can be shipped, may determine that the shipment of the predicted values included in them is OK, and if other than that, may perform predetermined alert processing. Further, for example, when the series data is obtained from each of a plurality of prediction target items, the shipping determination unit may evaluate them individually or may collectively evaluate them (collective evaluation). As an example, the shipping determination unit may collectively evaluate, for each shipping unit that is a unit in which the predicted value or the prediction model is shipped, the series data including the predicted value in the shipping unit.
As the alert processing, for example, it may be output to a predetermined server or a display device together with the series data of the prediction target item that is determined to be unshippable, and a manual shipment availability determination may be requested. Further, the result of manual shipment availability determination may be accepted.
It should be noted that it is also possible to provide a shipping determination device that semi-automatically performs input and shipping determination of such series data independently of a device or a system that performs prediction.
Next, a health simulation system will be described as an example of using the prediction system according to each of the above-described exemplary embodiments.
Each of the prediction units 31-1 to 31-u is realized by any of the above prediction systems. The prediction unit 31-1 to 31-u specifically predicts and outputs, for one inspection item i (for example, any of the inspection items 1 to u related to lifestyle-related diseases) that is predetermined as a prediction target item, a value of the prediction item at each prediction time point included in the specified evaluation target period, based on the input prediction target data.
Incidentally, each prediction unit 31 preliminarily stores, for the target value yi(t′) of the prediction target item at a certain time point t′ as a target variable, at least one prediction model (or a pair of a constrained model and a calibration model) learned using training data including at least, as explanatory variables, a variable xmain(t′−1) corresponding to the value of the prediction target item at the time point t′−1, a value {xsub_j(t′−1)} (where j=1 to p) of one or more arbitrary variables other than the xmain(t′−1) indicating a value that can be acquired at the time point t′−1, and a value {xclt_i(t′−1)} (where i=1 to q) of one or more inquiry items related to lifestyle habits indicating a value that can be acquired at the time point t′−1.
Here, xmain(t′−1) corresponds to the main variable, xsub_j(t′−1) corresponds to the sub-variable, and xclt_i(t′−1) corresponds to the control variable. Here, xsub_j(t′−1) may be, for example, a variable xmain(t′−2) indicating a value of the prediction target item at the time point t′−2 or a variable xdiff(t′−1)(=xmain(t′−1)−xmain(t′−2)) indicating the difference between a value of the prediction target item at the time point t′−1 and a value of the prediction target item at the time point t′−2. The sub variables are not limited to these variables. For example, it may be a statistical value (including a difference) calculated from the value of the prediction target item at the time point t′−1 and the value of another inspection item at the time point t′−1.
For example, xmain(t′−1) in the prediction unit 31-i may be an explanatory variable corresponding to xi(t) in the above formula (3). Also, for example, xsub_j(t′−1) may be an explanatory variable corresponding to xi′(t) (where i′=1 to u, but excluding i) in the above formula (3).
Note that xclt_i(t′−1) is not particularly limited as long as it is a value that can be acquired and can be controlled at the time point t′−1. For example, xclt(t′−1) may be a value of the inquiry item related to lifestyle habits at the time point t′ corresponding to the first prediction time point (first prediction time point) seen from the time point t′−1. In that case, it means that a future value is set for the inquiry item related to lifestyle habits at the time point t′−1.
The simulation unit 32 uses the prediction units 31-1 to 31-u to perform a simulation regarding the health of a designated employee or the like. In addition to the employee, the simulation target is not particularly limited as long as it is a target for which the past actual value (in this example, the result of the health checkup) is obtained regarding the prediction target item.
The simulation unit 32, for the specified user, if the current mode is a risk simulation mode in which the progress prediction is performed with the inquiry items (control variables) fixed, may use, as the prediction target data, a combination of explanatory variables represented by xmain(t), {xsub_j(t)}, and {xctl_i(t)}, which is obtained at the current time with the current time as the first prediction reference point tin each of the prediction units 31-1 to 31-u. Further, as a result, the simulation unit 32 may obtain xmain(t+1) to xmain(t+n) corresponding to the predicted value at each prediction time point (time point t+1 to time point t+n) included in the evaluation target period from each of the prediction units 31-1 to 31-u. At this time, the simulation unit 32 causes each of the prediction units 31 to calculate the predicted value, assuming that the value specified at each prediction time point, that is, the value set at the time point t for the value {xctl_i(t)} of the inquiry item is maintained.
In addition, in this example, each of the prediction units 31 is configured to be able to refer to the predicted values of the other prediction units 31.
In addition, the simulation unit 32, for the specified user, if the current mode is a lifestyle habits improvement simulation mode in which the progress prediction is performed without fixing the inquiry item (control variable), may use, as the prediction target data, a combination of explanatory variables represented by xmain(t), {xsub_j(t)}, and {{xctl_i(t)}, . . . , {xctl_i(t+n−1)}}, which is obtained at the current time point with the current time point as the first prediction reference point t in each of the prediction units 31-1 to 31-u. The above description means that the value set at each prediction reference point is specified for the control variable. As a result, the simulation unit 32 may obtain the variables xmain(t+1) to xmain(t+n) from each of the prediction units 31-1 to 31-u. The value of the inquiry item may be set, for example, by accepting a user operation on the screen.
Further, the simulation unit 32 uses xmain(t+1) to xmain(t+n) obtained from each of the prediction units 31-1 to 31-u to display the predicted value at each prediction time point of each inspection item. In this example, from each of the prediction units 31-1 to 31-u, variables xmain(t+1) to xmain(t+n) corresponding to the predicted value at each prediction time point for the prediction target item corresponding to the prediction unit 31 are obtained.
Further, the simulation unit 32 may present the transition of the predicted value obtained in the risk simulation mode and the transition of the predicted value obtained in the lifestyle habits improvement simulation mode so that they can be compared on a graph.
In addition, the simulation unit 32 can identify and present, to each of the prediction units 31 in advance, a predetermined number of high-order lifestyle habits that have a high improvement effect for each prediction target item, after comprehensively acquiring a prediction result when only one value of the inquiry item regarding lifestyle habits is changed from the value selected by the current user, while changing the inquiry item to be changed.
In each of the above exemplary embodiments, t represents a time point corresponding to the prediction unit time, but if the prediction target item changes a value by repeating some action or treatment, it is also possible that t represents the number of times the action or treatment is performed. Even in that case, in terms of expression, t may represent a time point corresponding to each time.
Next, the above-described method of calculating the invalidity score will be described in more detail. Hereinafter, the model selection unit 13 will be described as including a score calculation unit (not shown), but the model selection unit 13 can directly perform the process of the score calculating unit.
The score calculation unit inputs the series data including three or more pieces of data indicating the value of the prediction target item in association with time, at least one of these pieces of data indicating the predicted value, and calculates and outputs the invalidity score, which is an index representing the invalidity of the series data
In the following, the invalidity score is calculated as an index indicating how far the input series data is from the predetermined asymptotic model.
Here, the asymptotic model is a curve model that represents a curve having an asymptote parallel to the X-axis when time is the X-axis and the prediction item is the Y-axis, and more specifically, a curve model expressed by a function in which when x→∞, y(x) converges to a certain value. Here, x represents a point (coordinate) on the time axis corresponding to each data in the series data, and y(x) represents a prediction item value at the time point x. The asymptotic model may be a curve model represented by a function that satisfies at least the condition represented by the following formula (c1). Here, a is an arbitrary constant. The existing asymptote is not limited to one, and includes, for example, the one represented by a function in which two asymptotes exist such as a function called a logistic function or an arctangent function.
[Math 6]
lim
x→∞
y(x)=α (c1)
Although the score calculation unit can also use a curve model represented by one predetermined function form as the asymptotic model, for example, the model obtained by fitting the input series data to two or more predetermined functional forms that satisfy the above conditions may be used.
The fitting may be performed, for example, by searching for a solution (θ with a hat) of a model parameter θ that minimizes a predetermined loss function as shown in the following formula (c2).
[Math 7]
{circumflex over (θ)}=argminθ Σnloss(f(xn,θ),yn (c2)
ex)
loss(y1, y2)=(y1−y2)2,
f(x, θ)=c+bax
s.t. θ={a,b,c}, 0<α<1
In the formula (c2), n represents the time point of the value for which fitting is performed, loss( )represents the error function, and f( ) represents the function form of the fitting destination. It should be noted that f(xn, θ) represents the output when an arbitrary time point xn and a set of model parameters θ is given to the function form f( ), and f(xn, {circumflex over ( )}θ) represents the output at an arbitrary time point xn in the asymptotic model obtained by fitting. In the example shown in the formula (c2), the square loss is used as the error function, but the error function is not limited to the square loss.
The score calculation unit may calculate, for example, an error between the asymptotic model thus obtained and the series data of the input predicted values, and output the error as an invalidity score.
The score calculation unit may output, for example, an error value (error) represented by the following formula (c3) as an invalidity score.
[Math 8]
error=Σnloss(f(xn,{circumflex over (θ)}),yn) (c3)
The score calculation unit may independently specify the data used for fitting and the data used for calculating the error. The score calculation unit can also accept these designations from the user. At this time, the data used for the fitting (the data belonging to the first group) and the data used for calculating the error (the data belonging to the second group) do not have to completely match.
As an example, when the series data including N piece of data is input, the fitting is performed using the first half N′ (where N′<N) piece of data, and the error calculation may be performed using the value of remaining pieces of data (N-N′ pieces) or all pieces of data (N pieces). In addition, as another example, it is also possible to perform fitting using data at the time points that are not continuous in the series data such as the first, third, and fifth data, and perform error calculation using all pieces of data.
For example, when series data including five pieces of data is input, the score calculation unit may perform fitting and error calculation as follows, for example.
The fitting is performed with the first three pieces of data, and the error calculation is performed with the second two pieces of data.
The fitting is performed with the first three pieces of data, and the error calculation is performed with all pieces of data.
The fitting is performed with the first, third, and fifth data, and the error calculation is performed with all pieces of data.
Here, the number of pieces of data (the number of predicted values) Np included in the series data input to the score calculation unit is not particularly limited, but it is assumed that at least one is included. In practice, it is preferable that the series data include at least data indicating predicted values for the number of time points at which the progress prediction was performed. Note that the series data may include data indicating past actual values, and in that case, the above N represents the total number of pieces of data including data indicating past actual values. Note that N is presumed to be three or more, but from the viewpoint of fitting accuracy, for example, four or more is more preferable.
In addition, when the series data includes the data indicating the actual value, the error calculation may be performed using only the data indicating the predicted value.
In addition, the score calculation unit may rescale the x-coordinate, which is the value of the X-axis (the value representing the time corresponding to the value of each prediction item), before performing the fitting. When displaying series data as a graph, it is conceivable that the unit of a numerical value (scale width) may differ greatly between the vertical axis (Y-axis: predicted value) and the horizontal axis (X-axis: Time), such that the scale unit of the vertical axis is 50, the scale unit of the horizontal axis is 1, etc. In that case, even if the fitting is performed using the numerical values as they are, the asymptotic model expected by the viewer cannot be obtained. This is because the fitting is performed in the graph shape when the values that are greatly different in scale on the X axis and the Y axis are displayed in a graph at equal intervals, so the fitting is performed to a curve having a shape different from a curve that is a valid graph shape when actually displayed.
In order to eliminate such inconvenience, it is preferable to rescale the time value (x coordinate) associated with each data included in the series data according to the actual display, and then perform fitting.
For example, the score calculation unit, when the parameters (display parameters) related to the display such as the scale setting of the graph that displays the input series data is obtained, may convert the x coordinate so that the width of the X-axis main scale has the same unit as the width of the Y-axis main scale (50 units in the above example). In this case, whereas the unit of the width of the X axis main scale is 1, the unit of the width of the Y axis main scale is 50 (50 times), so the x coordinate is also 50 times. Hereinafter, such a scaling factor for rescaling the x axis may be referred to as a rescale parameter x_scale.
In addition, the score calculation unit can also obtain the rescale parameter x_scale as follows. The score calculation unit inputs, together with the series data, a display parameter that is a parameter when displaying the series data, and also can calculate the rescale parameter xscale, based on information obtained from the series data and the display parameter.
The following formula (c4) is an example of a formula for calculating the rescale parameter x_scale. In the formula (c4), ymax and ymim represent the maximum value and the minimum value of the prediction item included in the series data, respectively. Nd represents the number of pieces of data (the score of the prediction item to be displayed) included in the series data. Further, Ar represents the aspect ratio Ar (that is, the ratio of the horizontal width to the vertical width) of the display graph of the series data. In the formula (c4), 0.8 represents the display ratio in the vertical direction and 0.9 represents the display ratio in the horizontal direction, but these values are appropriately adjusted.
x_scale=((ymax−ymin)/0.8*Ar*0.9)/(Nd−1)) (c4)
In the example shown in the formula (4), ymax, ymim, and Nd correspond to the information obtained from the series data, and Ar, the vertical display ratio, and the horizontal display ratio correspond to the display parameters.
For example, when the aspect ratio is 1:2 (Ar=2) and five points are displayed in the frame, x scale is calculated as follows.
x_scale=((ymax−ymin)/0.8*2*0.9)/4)
In the above example, assuming that the unit of time that is associated with each data included in the series data is the prediction unit (that is, the number that increases by 1 each time the prediction time point increases by 1), x_scale is calculated as an index representing the x-direction interval of each data expressed in the unit of the y-axis. Therefore, when the unit of time in the series data is other than the prediction unit time, or when the unit of the x direction interval when displaying is other than 1, the x coordinate associated with each data may be divided by the prediction unit time or the unit of the x-direction interval to set the unit of the x axis to 1, and then multiplied by x_scale.
The score calculation unit can also accept the designation of x_scale. For example, the score calculation unit may input x_scale together with the series data. Note that the score calculation unit can also calculate the x_scale by inputting the above-mentioned display parameters together with the series data.
The server and other devices included in the prediction system and the health simulation system according to each of the above-described exemplary embodiments may be installed in the computer 1000. In that case, the operation of each device may be stored in the auxiliary storage device 1003 in the form of a program. The CPU 1001 reads the program from the auxiliary storage device 1003, expands it in the main storage device 1002, and executes the predetermined processing in each exemplary embodiment according to the program. The CPU 1001 is an example of an information processing device that operates according to a program, and may include, in addition to the CPU (Central Processing Unit), for example, MPU (Micro Processing Unit), MCU (Memory Control Unit), and GPU (Graphics Processing Unit), etc.
The auxiliary storage device 1003 is an example of a non-transitory tangible medium. Other examples of non-transitory tangible medium include a magnetic disk, a magneto-optical disk, CD-ROM, DVD-ROM, a semiconductor memory, or the like that is connected via the interface 1004. When the program is distributed to the computer 1000 through a communication line, the computer 1000 having received the distribution may expand the program into the main storage device 1002 and execute predetermined processing in each exemplary embodiment.
Further, the program may be a program for realizing a part of the predetermined processing in each exemplary embodiment. Further, the program may be a difference program that realizes predetermined processing in each exemplary embodiment in combination with another program already stored in the auxiliary storage device 1003.
The interface 1004 transmits/receives information to/from other devices. The display device 1005 also presents information to the user. Further, the input device 1006 accepts input of information from the user.
Further, depending on the processing content in the exemplary embodiment, some elements of the computer 1000 can be omitted. For example, the display device 1005 can be omitted if the computer 1000 does not present information to the user. For example, if the computer 1000 does not accept information input from the user, the input device 1006 can be omitted.
Also, some or all of the components in each of the above-described exemplary embodiments are implemented by a general-purpose or dedicated circuit (circuitry), a processor or the like, or a combination thereof. These may be configured by a single chip or may be configured by a plurality of chips connected via a bus. Some or all of the components in each of the above-described exemplary embodiments may be realized by a combination of the above-described circuitry and the like and a program.
When some or all of the components in each of the above-described exemplary embodiments are realized by a plurality of information processing devices or circuits, the plurality of information processing devices or circuits may be centrally arranged or distributed. For example, the information processing device, the circuit, and the like may be realized as a form in which a client and server system, a cloud computing system, and the like are connected to each other via a communication network.
Next, summary of the present invention will be described.
The regularization parameter candidate setting means 61 (for example, the regularization parameter candidate setting unit 11, the regularization parameter candidate setting unit 221) outputs a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model
The model learning means 62 (for example, the model learning unit 12, the model learning unit 222) learns a prediction model corresponding to each of a plurality of solution candidates included in the search set using the training data.
The accuracy evaluation means 63 (for example, a part of the model selection unit 13, a part of the model selection unit 223, the model evaluation unit 131), evaluates the prediction accuracy of each of the plurality of learned prediction models using the predetermined verification data.
The transition evaluation means 64 (for example, a part of the model selection unit 13, a part of the model selection unit 223, the model evaluation unit 131, the model determination unit 132) evaluates, for each of the plurality of learned prediction models, the graph shape indicated by the transition of the predicted value obtained from the prediction model or the number of defective samples, which is the number of samples for which the transition is not valid, using predetermined verification data.
The model determination means 65 (for example, a part of the model selection unit 13, a part of the model selection unit 223, the model determination unit 132) determines, based on the evaluation result regarding the prediction accuracy and the evaluation result regarding the graph shape or the number of defective samples, a single prediction model to be used for the progress prediction from among the plurality of learned prediction models.
With the above configuration, it is possible to generate a model having a good balance between the prediction accuracy and the number of defective samples. Therefore, in the progress prediction, it is possible to reduce the possibility of presenting an uninterpretable prediction result while maintaining the prediction accuracy.
The model storage means 66 (for example, the model storage unit 14, the model storage unit 24) stores a prediction model used for progress prediction.
The prediction means 67 (for example, the model application unit 15, the model application unit 25), when the prediction target data is given, uses the prediction model stored in the model storage means 66 to perform progress prediction.
The above exemplary embodiment can be described as the following supplementary notes.
(Supplementary note 1) A model generation system including: regularization parameter candidate setting means that outputs a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model;
model learning means that learns, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; accuracy evaluation means that evaluates, using predetermined verification data, a prediction accuracy of each of a plurality of the learned prediction models; transition evaluation means that evaluates, for each of the plurality of the learned prediction models, a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; and model determination means that determines a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples.
(Supplementary note 2) The model generation system according to supplementary note 1, in which among the explanatory variables, the strong regularization variable is a variable indicating a value of a prediction target item at an arbitrary time point before a prediction reference point or a variable indicating an arbitrary statistic calculated from the variable and another variable.
(Supplementary note 3) The model generation system according to supplementary note 1 or 2, in which the prediction model is a prediction model in which a variable indicating a value of the prediction target item at a prediction time point is a target variable, and only a main variable that is a variable indicating a value of the prediction target item at a prediction reference point, the strong regularization variable, and one or more control variables that can be controlled by a person are explanatory variables, and a value of the control variable is fixed in the progress prediction.
(Supplementary note 4) The model generation system according to any one of supplementary notes 1 to 3, in which the accuracy evaluation means obtains, for each of the plurality of the learned prediction models, an index regarding a prediction accuracy based on inspection data, the transition evaluation means obtains, for each of the plurality of the learned prediction models, an index regarding the graph shape or the number of defective samples based on the inspection data, and the model determination means determines a prediction model used for the progress prediction based on the index regarding the prediction accuracy and the index regarding the graph shape or the number of defective samples.
(Supplementary note 5) A model generation system including: constrained model evaluation means that evaluates, using predetermined verification data, a prediction accuracy of a constrained model, which is one of prediction models used for progress prediction performed by fixing some values of a plurality of explanatory variables, which is a prediction model that predicts a value of a prediction target item at a prediction time point when a predicted value is obtained, and which is a prediction model in which at least a constraint that a variable other than a main variable indicating a value of the prediction target item at a prediction reference point is not used as a non-control variable that is an explanatory variable whose value changes in the progress prediction is imposed to the explanatory variable; regularization parameter candidate setting means that outputs a search set which is a search set of regularization parameters used for regularization of a calibration model, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a model formula of the calibration model, the calibration model being one of the prediction models used for the progress prediction, the calibration model being a prediction model for predicting a calibration value for calibrating the predicted value obtained in the constrained model for arbitrary prediction target data, and the calibration model being a prediction model including two or more non-control variables and one or more control variables that can be controlled by a person in the explanatory variables; model learning means that learns, using training data, a calibration model corresponding to each of the plurality of solution candidates included in the search set;
accuracy evaluation means that evaluates, using predetermined verification data, a prediction accuracy of each of the plurality of learned calibration models; transition evaluation means that evaluates, for each of the plurality of learned calibration models, a graph shape indicated by a transition of the predicted value after calibration obtained as a result of calibrating the predicted value obtained from the constrained model with the calibration value obtained from the calibration model or the number of defective samples, which is the number of samples for which the transition is not valid, using predetermined verification data; and model determination means that determines a calibration model used for the progress prediction from among the plurality of learned calibration models based on an index regarding the prediction accuracy and an index regarding the graph shape or the number of defective samples.
(Supplementary note 6) The model generation system according to any one of supplementary notes 1 to 5, in which the index regarding the graph shape is an invalidity score calculated based on an error between a curve model obtained by fitting series data into a predetermined function form and the series data, the series data including data indicating a predicted value at each prediction time point obtained by the previous progress prediction, and the series data including three or more pieces of data indicating a value of the prediction target item in association with time.
(Supplementary note 7) A prediction system including: regularization parameter candidate setting means that outputs a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model; model learning means that learns, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; accuracy evaluation means that evaluates, using predetermined verification data, a prediction accuracy of each of a plurality of the learned prediction models; transition evaluation means that evaluates, for each of the plurality of the learned prediction models, a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; model determination means that determines a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples; model storage means that stores a prediction model used for the progress prediction; and prediction means that when prediction target data is given, performs the progress prediction using the prediction model stored in the model storage means.
(Supplementary note 8) The prediction system according to supplementary note 7, in which the model determination means determines, before the progress prediction is performed, a prediction model used for the progress prediction, and the model storage means stores the prediction model determined by the model determination means.
(Supplementary note 9) The prediction system according to supplementary note 7, in which the model storage means stores a plurality of the prediction models learned by the model learning means, the prediction means performs the progress prediction using each of the plurality of prediction models stored in the model storage means to acquire a predicted value at each prediction time point in a period targeted for the current progress prediction from each of the prediction models, the transition evaluation means, based on verification data including prediction target data in the current progress prediction, evaluates a graph shape at the time of the past progress prediction including the time of the current progress prediction, or the number of defective samples at the time of the past progress prediction including the time of the current progress prediction, and the model determination means, based on the index regarding the prediction accuracy and the index regarding the graph shape or the number of defective samples, determines a prediction model to use for the predicted value at each prediction time point in the current progress prediction from among the plurality of prediction models stored in the model storage means.
(Supplementary note 10) The prediction system according to any one of supplementary notes 7 to 9, further including shipping determination means that performs, when series data is input, evaluation regarding at least the graph shape or the number of defective samples on the series data, a predicted value included in the series data, or a prediction model that has obtained the predicted value, and performs shipping determination of the predicted value based on the evaluation result, the series data being the index regarding the graph shape that includes data indicating a predicted value at each prediction time point obtained by the previous progress prediction, and the series data including two or more pieces of data indicating a value of the prediction target item in association with time.
(Supplementary note 11) A model generation method including: outputting a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model; learning, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; evaluating, for each of the plurality of the learned prediction models, each of a prediction accuracy and a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; and determining a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples.
(Supplementary note 12) A model generation program for causing a computer to execute the processes of: outputting a search set which is a search set of regularization parameters used for regularization of a prediction model used for progress prediction performed by fixing some values of a plurality of explanatory variables, and in which a plurality of solution candidates are set, the solution candidates having at least mutually different values of a regularization parameter that affects a term of a strong regularization variable that is one or more variables specifically defined among the explanatory variables used for a prediction formula of the prediction model; learning, using training data, a prediction model corresponding to each of the plurality of solution candidates included in the search set; evaluating, for each of the plurality of the learned prediction models, each of a prediction accuracy and a graph shape indicated by a transition of a predicted value obtained from the prediction model or a number of defective samples, which is a number of samples for which the transition is not valid, using predetermined verification data; and determining a prediction model used for the progress prediction from among the plurality of the learned prediction models based on an evaluation result regarding the prediction accuracy and an evaluation result regarding the graph shape or the number of defective samples.
Although the present invention has been described above with reference to the exemplary embodiments and examples, the present invention is not limited to the above-described exemplary embodiments and examples. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims the priority on the basis of Japanese patent application 2018-068278 for which it applied on Mar. 30, 2018, and takes in its entirety of the disclosure herein.
The present invention can be suitably applied to a thing that obtains a predicted value at each prediction time point included in a predetermined evaluation target period specified as a progress prediction target and including one or more prediction time points by fixing some values of a plurality of explanatory variables.
Number | Date | Country | Kind |
---|---|---|---|
2018-068278 | Mar 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/045610 | 12/12/2018 | WO | 00 |