The present invention relates to a data estimation technique for estimating an objective variable from an explanatory variable.
As a machine learning method, decision tree learning for creating a classifier having a tree structure from training data including an explanatory variable and an objective variable is used. A classification result can be predicted for unknown input data using a learned decision tree. Further, random forest in which a plurality of decision trees are learned with training data randomly changed, and a prediction is made by taking a majority decision to enhancing generalization ability is used.
The learning device disclosed in Patent Literature 1 creates a plurality of decision trees, using pieces of training data each including an explanatory variable and an objective variable, which are configured by a combination of the explanatory variables and each estimate the objective variable on the basis of true or false of the explanatory variables. The learning device creates a linear model that is equivalent to the plurality of decision trees and lists all terms configured by a combination of the explanatory variables without omission to output a stable prediction result by using the linear model from input data.
[Patent Literature 1] JP 2020-46891 A
Machine learning for predicting an objective variable from an explanatory variable has a problem that accuracy in objective variable estimation reaches a plateau.
The present invention has been made in view of such a problem, and it is therefore an object of the present invention to provide a data estimation technique capable of improving accuracy in estimation of an objective variable from an explanatory variable.
In order to solve the above-described problem, a data estimation device according to one aspect of the present invention includes a learning unit that creates, using training data including an explanatory variable and an objective variable, a machine learning model that estimates an objective variable from an explanatory variable. The learning unit creates a machine learning model Mi that estimates an objective variable Oi from an explanatory variable group Ei including one or more explanatory variables, sets a new explanatory variable group Ei+1 by adding the objective variable Oi estimated by the machine learning model Mi to the explanatory variable group Ei, and creates a machine learning model Mi+1 that estimates an objective variable Oi+1 from the explanatory variable group Ei+1 (where i=1).
Another aspect of the present invention is a data estimation method. The method includes a learning process of creating, using training data including an explanatory variable and an objective variable, a machine learning model that estimates an objective variable from an explanatory variable. The learning process includes creating a machine learning model Mi that estimates an objective variable Oi from an explanatory variable group Ei including one or more explanatory variables, setting a new explanatory variable group Ei+1 by adding the objective variable Oi estimated by the machine learning model Mi to the explanatory variable group Ei, and creating a machine learning model Mi+1 that estimates an objective variable Oi+1 from the explanatory variable group Ei+1 (where i=1).
Note that any combination of the above-described components, or an entity that results from replacing expressions of the present invention among a method, a device, a system, a computer program, a data structure, a recording medium, and the like is also valid as an aspect of the present invention.
According to the present invention, it is possible to increase accuracy in estimation of an objective variable from an explanatory variable.
In a training phase, the data estimation device 100 creates, using training data including explanatory variables and objective variables, a machine learning model that estimates an objective variable from an explanatory variable. In a prediction phase, the data estimation device 100 inputs an unknown explanatory variable to the created machine learning model to predict an objective variable.
First, a configuration and operation of the data estimation device 100 in the training phase will be described. A value of an explanatory variable and a value of an objective variable are given to the learning unit 20 as training data. The learning unit 20 creates, using the given training data, a machine learning model that estimates an objective variable from an explanatory variable, and stores the machine learning model into the learning model storage unit 70. As a machine learning method, a regression model, a decision tree, a random forest, Bayesian estimation, a neural network, or the like may be used.
The objective variable output unit 30 outputs the value of the objective variable estimated from the value of the explanatory variable on the basis of the learned machine learning model, and stores the value of the objective variable into the objective variable storage unit 80. The explanatory variable adding unit 40 newly adds the objective variable estimated on the basis of the machine learning model to the explanatory variable and stores the explanatory variable into the explanatory variable storage unit 60.
The explanatory variable input unit 10 reads the newly set explanatory variable from the explanatory variable storage unit 60 and supplies the newly set explanatory variable to the learning unit 20. The learning unit 20 creates a machine learning model that estimates an objective variable from the newly set explanatory variable, and stores the machine learning model into the learning model storage unit 70. Subsequently, a machine learning model is repeatedly created with the estimated objective variable newly added to the explanatory variable.
Next, a configuration and operation of the data estimation device 100 in the prediction phase will be described. In the prediction phase, the learning unit 20 functions as a prediction unit.
The explanatory variable input unit 10 reads the value of the explanatory variable stored in the explanatory variable storage unit 60 as unknown data, and gives the explanatory variable to the learning unit 20.
The learning unit 20 reads the machine learning model stored in the learning model storage unit 70 and estimates an objective variable from the explanatory variable on the basis of the machine learning model. The objective variable output unit 30 stores the estimated value of the objective variable into the objective variable storage unit 80.
The explanatory variable adding unit 40 newly adds the objective variable estimated on the basis of the machine learning model to the explanatory variable and stores the explanatory variable into the explanatory variable storage unit 60.
The explanatory variable input unit 10 reads the newly set explanatory variable from the explanatory variable storage unit 60 and supplies the newly set explanatory variable to the learning unit 20. The learning unit 20 estimates an objective variable from the newly set explanatory variable on the basis of the machine learning model stored in the learning model storage unit 70. Subsequently, an objective variable is repeatedly estimated using the machine learning model with the estimated objective variable newly added to the explanatory variable.
The evaluation item display unit 50 calculates and displays a value of each evaluation item on the basis of the estimated value of the objective variable. The evaluation item display unit 50 may calculate the value of each evaluation item on the basis of the value of the explanatory variable and the estimated value of the objective variable.
It is assumed that one or more explanatory variables and a plurality of objective variables Oi (i=1 to n) are given as training data.
The explanatory variable input unit 10 sets the one or more explanatory variables to an explanatory variable group E1 (S10). The learning unit 20 sets a variable i to 1 (S20).
The learning unit 20 creates a learning model Mi that estimates an objective variable Oi from an explanatory variable group Ei, and the objective variable output unit 30 outputs the estimated objective variable Oi to the objective variable storage unit 80 (S30).
The explanatory variable adding unit 40 adds the objective variable Oi estimated by the learning model Mi to the explanatory variable group Ei to set a new explanatory variable group Ei+1, and stores the explanatory variable group Ei+1 into the explanatory variable storage unit 60 (S40).
The learning unit 20 increments the variable i by 1 (S50). In a case where the variable i is greater than n (Y in S60), the processing of creating a machine learning model is brought to an end. In a case where the variable i is less than or equal to n (N in S60), the processing returns to step S30, and the subsequent procedure is repeated.
As described above, in a case where explanatory variables are unchanged and there are a plurality of objective variables, it is possible to increase the accuracy in objective variable estimation by the following procedures (1) to (3):
Next, a case where secondary data on running performance is learned and predicted from primary data on running of a runner and physical characteristic data on the runner using the data estimation device 100 will be described as an example.
The primary data on running that is measurable using a sensor attached to shoes of the runner includes a running pace, a stride, a pitch, a grounding time, and a hang time. The physical characteristic data on the runner includes height and weight. Such measurable primary data and physical characteristic data serve as an explanatory variable.
The secondary data on running performance to be estimated includes a second peak value (denoted as “Fz 2nd max”) of a z component of a ground reaction force (denoted as “Fz”), a propulsion force product, a braking force product, and a rising rate of the z component of the ground reaction force (denoted as “Fz Loading Rate”). Such secondary data serves as an objective variable.
A second peak value of the z component Fz of the ground reaction force is Fz 2nd max, and a slope of the rise of the z component Fz of the ground reaction force is Fz Loading Rate. An area of a region where the y component of the ground reaction force has a positive value is the propulsion force product, and an area of a region where the y component of the ground reaction force has a negative value is the braking force product.
Hereinafter, for convenience of description, Fz 2nd max is referred to as secondary data A, the propulsion force product is referred to as secondary data B, the braking force product is referred to as secondary data C, and Fz Loading Rate is referred to as secondary data D. Such pieces of secondary data A to D are also referred to as objective variables A to D.
As an example, the pieces of secondary data A, B, C, D that are objective variables are estimated in this order from the primary data and the physical characteristic data that are explanatory variables using the machine learning model, and the estimated pieces of secondary data A, B, C, D are added to the explanatory variables in this order. The order in which the objective variables are estimated, in other words, the order in which the estimated objective variables are input to the explanatory variables, may be different from the above-described order, and how to determine an input order that makes the accuracy in objective variable estimation higher will be described later.
The primary data on running is acquired from the sensor attached to the shoes of the runner and is stored into the explanatory variable storage unit 60 together with the physical characteristic data on the runner (S100).
The learning unit 20 inputs, to a regression model, the primary data and the physical characteristic data as explanatory variables to estimate the secondary data A on running performance as an objective variable (S110).
The explanatory variable adding unit 40 newly adds the estimated secondary data A to the explanatory variables, and the learning unit 20 inputs, to the regression model, the primary data, the physical characteristic data, and the secondary data A as explanatory variables to estimate the secondary data B as an objective variable (S120).
The explanatory variable adding unit 40 newly adds the estimated secondary data B to the explanatory variables, and the learning unit 20 inputs, to the regression model, the primary data, the physical characteristic data, the secondary data A, and the secondary data B as explanatory variables to estimate the secondary data C as an objective variable (S130).
The explanatory variable adding unit 40 newly adds the estimated secondary data C to the explanatory variables, and the learning unit 20 inputs, to the regression model, the primary data, the physical characteristic data, the secondary data A, the secondary data B, and the secondary data C as explanatory variables to estimate the secondary data D as an objective variable (S140).
The evaluation item display unit 50 calculates and displays the value of each evaluation item on the basis of the primary data and the secondary data A to D (S150).
The coefficient of determination of the objective variable A is 0.84 in a case where a prediction is made only with the explanatory variables, which is the first objective variable estimation, so that the coefficient of determination of the objective variable A is also 0.84 under the present technique.
The coefficient of determination of the objective variable B is 0.67 in a case where a prediction is made only with the explanatory variables, and is improved to 0.69 in a case where a prediction is made with the estimated objective variable A added to the explanatory variables.
The coefficient of determination of the objective variable C is 0.36 in a case where a prediction is made only with the explanatory variables, and is improved to 0.52 in a case where a prediction is made with the estimated objective variable B further added to the explanatory variables.
The coefficient of determination of the objective variable D is 0.69 in a case where a prediction is made only with the explanatory variables, and is improved to 0.74 in a case where a prediction is made with the estimated objective variable C further added to the explanatory variables.
In the example, sequentially adding each objective variable estimated by the regression model to the explanatory variables and estimating the next objective variable by the regression model makes it possible to improve the accuracy in objective variable estimation.
Next, a method for further improving the accuracy in objective variable estimation by changing the order in which the plurality of objective variables are input to the explanatory variables will be described.
In order to determine an order in which n objective variables are input as explanatory variables, a machine learning model is created in each input order to calculate accuracy in prediction about the n objective variables, and an input order in which the mean value of the accuracy in prediction about the n objective variables becomes the largest or the standard deviation of the accuracy in prediction about the n objective variables becomes the smallest is finally selected as an optimum input order.
In the above-described example, the accuracy in prediction about the four objective variables A to D was evaluated for all 24 input orders of the four objective variables A to D. In a case where the four objective variables were input in the order of A, D, B, C, the coefficients of determination of the objective variables A, B, C, D were 0.84, 0.70, 0.55, and 0.71, respectively, the mean value of the coefficients of determination of the four objective variables A to D was 0.7000, and the standard deviation of the coefficients of determination of the four objective variables A to D was 0.1186. In a case where input was made in an order of A, D, B, C among the 24 input orders, the mean value of the coefficients of determination of the four objective variables A to D was the largest, and the standard deviation was the smallest. The input order of A, D, B, C is selected as an optimum input order.
The optimum input order of the objective variables can be derived on the basis of correlation coefficients between the objective variables without testing all the input orders. Next, a method for determining the optimum input order of the objective variables will be described.
(Step 1) Select an objective variable that is the highest in prediction accuracy when a regression model is built only with explanatory variables, and build a regression model with the objective variable added to the explanatory variables.
In the example, with reference to
(Step 2) Obtain correlation coefficients between the selected objective variable and all the remaining objective variables, select an objective variable having the largest absolute value of the correlation coefficient, and build the regression model with the objective variable newly added to the explanatory variables. Here, in a case where a plurality of objective variables are selected, the mean of the absolute values of the correlation coefficients is obtained.
In the example, as shown in
(Step 3) Repeat step 2.
In the example, as shown in
(Step 4) Repeat step 2 in a case where an objective variable still remains.
In the example, the regression model is built with the last objective variable C newly added to the explanatory variables.
As described above, the data estimation device 100 of the present embodiment can improve the accuracy in objective variable estimation by first building, in a case where there are a plurality of objective variables to be predicted from explanatory variables, a machine learning model capable of estimating any objective variable using the explanatory variables, and then repeatedly building, with an estimated objective variable added to the explanatory variables, a machine learning model capable of estimating the next objective variable. Note that the present invention is applicable to not only an example where an objective variable is added to explanatory variables on a one-by-one basis but also an example where a plurality of objective variables are added to the explanatory variables.
The present invention has been described on the basis of the embodiment. It is to be understood by those skilled in the art that the embodiment is illustrative and that various modifications are possible for a combination of components or processes, and that such modifications are also within the scope of the present invention.
Although the example where the secondary data on running performance is estimated from the primary data on running of the runner and the physical characteristic data on the runner has been described, the present invention is applicable to any example as long as an objective variable is estimated from an explanatory variable.
The present invention is applicable to a data estimation technique.
10 explanatory variable input unit, 20 learning unit, 30 objective variable output unit, 40 explanatory variable adding unit, 50 evaluation item display unit, 60 explanatory variable storage unit, 70 learning model storage unit, 80 objective variable storage unit, 100 data estimation device
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/023286 | 6/12/2020 | WO |