The present invention relates to an evaluation system, an evaluation method, and an evaluation program for evaluating a result of optimization based on prediction.
In recent years, data-driven decision making has received a great deal of attention, and has been used in many practical applications. One of the most promising approaches is mathematical optimization based on prediction models generated by machine learning. Recent advances in machine learning have made it easier to create accurate prediction models, and prediction results have been used to build mathematical optimization problems. In the following description, such a problem is referred to as predictive mathematical optimization, or simply as prediction optimization.
These approaches are used in applications where frequent trial and error processes are not practical, such as water distribution optimization, energy generation planning, retail price optimization, supply chain management, and portfolio optimization.
One of the important features of prediction optimization is that, unlike standard optimization, an objective function is estimated by machine learning. For example, in price optimization based on prediction, since future profits are originally unknown, a function for predicting profits is estimated by a regression equation of a demand, as a function of product prices.
PTL 1 discloses an ordering plan determination device that determines an ordering plan for a product. The ordering plan determination device described in PTL 1 calculates a combination of a price and an order quantity of a product for maximizing a profit, by predicting a demand of the product for each price, and using the predicted demand to solve an objective function optimization problem with a price and an order quantity as an input and a profit as an output.
NPL 1 describes a method of determining an appropriate discount for a given Sharpe ratio.
PTL 1: Japanese Patent Application Laid-Open No. 2016-110591
NPL 1: Harvey, Campbell R and Liu, Yan, “Backtesting”, SSRN Electronic Journal, 2015
As a specific method of determining a strategy on the basis of prediction, as described in PTL 1, there is means that creates a prediction model on the basis of observed data and calculates an optimal strategy on the basis of the prediction model. At this time, it is important to estimate an effect of the optimized result. One of simple methods for evaluating the effect is a method of estimating an effect of an optimal solution by using a prediction model used for the optimization.
Here, an objective function f(z, θ{circumflex over ( )}) estimated for a (true) objective function f(z, θ*) representing the reality itself is assumed. Note that, in the present specification, a superscript {circumflex over ( )} may be described together with a symbol. For example, the superscript {circumflex over ( )} of θ may be written as θ{circumflex over ( )}.
z and θ are decision variables, each of which is a parameter of f. Further, an estimated optimal strategy is z{circumflex over ( )}. That is,
{circumflex over (z)}=arg maxz∈Z f(z,{circumflex over (θ)}) [Formula 1]
is satisfied. Here, Z is a range in which z can move.
In prediction optimization, it is difficult to observe f(z, θ{circumflex over ( )}) because it is necessary to execute the strategy z{circumflex over ( )} in a real environment. Therefore, in order to evaluate a characteristic of z{circumflex over ( )}, f(z{circumflex over ( )}, θ*) is generally estimated with f(z{circumflex over ( )}, θ*).
However, as described in NPL 1, f(z{circumflex over ( )}, θ{circumflex over ( )}) tends to be highly optimistic in algorithm investment and portfolio optimization. In other words, an optimal value based on estimation is generally biased in an optimistic direction.
According to the description of NPL 1, a general method for evaluating a trading strategy is a simple heuristic method of discounting an estimation target by 50%. That is, in NPL 1, 0.5f(z{circumflex over ( )}, 0{circumflex over ( )}) is regarded as an estimator of f(z, θ*). Further, recent studies have proposed algorithms that are statistically analyzed and mitigate a problem.
However, these algorithms are limited to specific applications (for example, algorithm investment). Furthermore, in a general prediction optimization problem, there is no principle algorithm for the unbiased estimator of f(z, θ*).
Therefore, an object of the present invention is to provide an evaluation system, an evaluation method, and an evaluation system capable of performing an evaluation while suppressing an optimistic bias in prediction optimization.
An evaluation system according to the present invention includes: a learning unit that generates a plurality of sample groups from samples to be used for learning, and generates a plurality of prediction models while inhibiting overlapping of a sample group to be used for learning among generated sample groups; an optimization unit that generates an objective function, based on an explained variable predicted by the prediction model and based on a constraint condition for optimization, and optimizes a generated objective function; and an evaluation unit that evaluates an optimization result by using a sample group that has not been used in learning of a prediction model used for generating an objective function targeted for the optimization.
An evaluation method according to the present invention includes: generating a plurality of sample groups from samples to be used for learning; generating a plurality of prediction models while inhibiting overlapping of a sample group to be used for learning among generated sample groups; generating an objective function, based on an explained variable predicted by the prediction model and based on a constraint condition for optimization; optimizing a generated objective function; and evaluating an optimization result by using a sample group that has not been used in learning of a prediction model used for generating an objective function targeted for the optimization.
An evaluation program according to the present invention causes a computer to execute: a learning process of generating a plurality of sample groups from samples to be used for learning, and generating a plurality of prediction models while inhibiting overlapping of a sample group to be used for learning among generated sample groups; an optimization process of generating an objective function, based on an explained variable predicted by the prediction model and based on a constraint condition for optimization, and optimizing a generated objective function; and an evaluation process of evaluating an optimization result by using a sample group that has not been used in learning of a prediction model used for generating an objective function targeted for the optimization.
According to the present invention, it is possible to perform evaluation while suppressing an optimistic bias in prediction optimization.
First, an optimistic bias in an optimal value will be described with use of a specific example. Here, in order to simplify the description, a case will be described in which an expected value of a profit in a coin toss game is estimated. In the coin toss game described here, it is predicted whether a head (H) or a tail (T) will be obtained when tossing a coin, one dollar is earned when the prediction is correct, and nothing is earned when the prediction is incorrect.
Here, when three trials are performed, there are four patterns of (1) three times head (HHH), (2) two times head and one time tail (HHT), (3) one time head and two times tail (HTT), and (4) three times tail (TTT). In these four patterns, probabilities of obtaining the head are estimated to be (1) 1, (2) 2/3, (3) 1/3, and (4) 0, respectively.
Taking into account the probability of obtaining the head in each pattern, it is considered optimal to bet on the head in patterns (1) and (2), and optimal to bet on the tail in the patterns (3) and (4). In betting like this, the expected profit in the pattern of (1) is calculated as 1×$1=$1, the expected profit in the pattern of (2) as 2/3×$1=$0.67, the expected profit in pattern (3) as (1−1/3)×$1=$0.67, and the expected profit in the pattern (4) as (1−0)×$1=$1. If the probability of the head is 1/2, the probabilities of observing these patterns (1), (2), (3), and (4) are 1/8, 3/8, 3/8, and 1/8. Therefore, the expected value of the expected profit in consideration of the optimal solution of these four patterns is calculated as 1×1/8+0.67×3/8+0.67×3/8+1×1/8=$0.75. This is the expected value of the estimated value of the profit when the optimal solution is selected on the basis of prediction.
However, when a coin is tossed, the probability of the head (or tail) is 1/2. Therefore, the expected profit should be 1/2×$1=$0.5. That is, it can be seen that the expected value (0.75 dollars) of the estimated value of the profit when the optimal solution is selected on the basis of prediction contains an optimistic bias as compared with the actual expected profit (0.5 dollars).
Next, a description is given to a reason why f(z{circumflex over ( )}, θ{circumflex over ( )}) is not an appropriate estimator of f(z{circumflex over ( )}, θ*) even if θ{circumflex over ( )} is an appropriate estimator of θ*.
Suppose that the objective function f(z, θ{circumflex over ( )}) is an unbiased estimation amount of a true objective function f(z, θ*), that is, the following Equation 1 is satisfied.
[Formula 2]
x[f(z, {circumflex over (θ)})]=x[f(z, θ*)], z ∈Z (Equation 1)
From an equality relation of the above Equation 1, Ex[f(z{circumflex over ( )}, θ{circumflex over ( )})] and f(z{circumflex over ( )}, θ{circumflex over ( )}) each are also considered to be an estimation amount of Ex[f(z{circumflex over ( )}, 0*)] and f(z{circumflex over ( )}, θ* ), respectively. However, the following theorem exists.
That is, suppose that Equation 1 is satisfied and z{circumflex over ( )} and z* each satisfy the following.
[Formula 3]
{circumflex over (z)} ∈ arg maxz∈Z f(z, {circumflex over (θ)})
z* ∈ arg maxz∈Z f(z, θ*)
In this case, the following Equation 2 is satisfied. Further, when there is a possibility that z{circumflex over ( )} is not optimal for the true objective function f(z, θ*), the inequality on the right in Equation 2 is satisfied with an inequality sign.
[Formula 4]
x[f({circumflex over (z)}, {circumflex over (θ)})]≥x[f({circumflex over (z)}, θ*)] (Equation 2)
This theorem states that even if the estimated objective function f(z, θ{circumflex over ( )}) is an unbiased estimation amount of the true objective function, an estimated optimal value f(z, θ{circumflex over ( )}) is not the unbiased estimation amount of f(z{circumflex over ( )}, θ*).
This optimistic bias is empirically known in the context of portfolio optimization. For this problem, a bias correction method based on a statistical test has been proposed. However, this is applicable only when the objective function is a Sharpe ratio. However, although these methods are applicable to general prediction optimization problems, they have not been shown to obtain unbiased estimators.
For this problem, the inventor has found a solution based on cross-validation in empirical risk minimization (ERM). Specifically, the inventor has found a method to solve the optimistic bias problem by using an overfitting solution of machine learning.
In supervised machine learning, a learning device determines a prediction rule h{circumflex over ( )} ∈ H by minimizing empirical errors. That is, the following Equation 3 is satisfied.
xn in Equation 3 is observation data generated from a distribution D, and 1 is a loss function. The empirical error shown in Equation 4 below
is an unbiased estimator of a generalization error for any fixed prediction rule h.
(h) :=I˜ [Formula 7]
That is, for any fixed h, the following Equation 5 is satisfied.
Despite Equation 5 described above, an empirical error in a calculated parameter h{circumflex over ( )} is in most cases smaller than a generalization error in h{circumflex over ( )}. This is because, as is well known, h{circumflex over ( )} overfits an observed sample.
In response to this situation, the inventor has found that the cause of the problem of optimistic bias and overfilling in machine learning is reusing of a data set in evaluation of an objective function and evaluation of an objective value.
Table 1 shows a comparison between empirical risk minimization (ERM) and prediction optimization.
As shown in Table 1, the problem related to prediction optimization bias has a similar structure to the problem of minimizing empirical risk. Typical methods for estimating a generalization error in machine learning are cross-validation and asymptotic bias correction such as Akaike Information Criterion (AIC).
In view of the above, in the present exemplary embodiment, there is generated an unbiased estimator for the value f(z{circumflex over ( )}, θ*) of the true objective function in the calculated strategy. That is, in the present exemplary embodiment, an estimator ρ(Xn→R) that satisfies Equation 6 below is generated. Note that, in the present exemplary embodiment, an unbiased estimator of θ* is assumed as θ{circumflex over ( )}.
[Formula 9]
x[ρ(x)]=x[f({circumflex over (z)}, θ*)] (Equation 6)
Based on the above assumption, an exemplary embodiment of the present invention will be described with reference to the drawings. Hereinafter, price optimization based on prediction will be described with reference to specific examples. In an example of price optimization based on prediction, a predicted profit corresponds to an evaluation result.
The storage unit 10 stores learning data (hereinafter, also referred to as a sample) to be used by the learning unit 20 described later for learning. In a case of the price optimization example, past sales data, prices, and data representing factors affecting sales (hereinafter, also referred to as external factor data) are stored as learning data.
Further,
Further, the storage unit 10 stores a constraint condition when the optimization unit 30 described later performs an optimization process.
The learning unit 20 generates a prediction model for predicting variables to be used for calculation of optimization. For example, in a case of a price optimization problem for maximizing total sales, sales are calculated as a product of a price and a sales volume. Therefore, the learning unit 20 may generate a prediction model for predicting a sales volume. In the following description, an explanatory variable means a variable that may affect a prediction target. For example, when the prediction target is a sales volume, a selling price and a sales volume of a product in the past, calendar information, and the like correspond to the explanatory variable.
The prediction target is also called “objective variable” in the field of machine learning. Note that, in order to avoid confusion with “objective variable” generally used in an optimization process described below, a variable representing a prediction target will be referred to as an explained variable in the following description. Therefore, the prediction model can be said to be a model in which the explained variable is represented with use of one or more explanatory variables.
Specifically, the learning unit 20 divides a sample to be used for learning to generate a plurality of sample groups. Hereinafter, in order to simplify the description, a description is given to a case where a sample is divided into two sample groups (hereinafter, referred to as a first sample group and a second sample group). However, the number of sample groups to be generated is not limited to two, and may be three or more.
The learning unit 20 generates a prediction model by using the generated sample group. At this time, the learning unit 20 generates a plurality of prediction models while inhibiting overlapping of a sample group to be used for learning among the generated sample groups. For example, when two sample groups are generated, the learning unit 20 uses the first sample group to generate a first prediction model for predicting a sales volume of a product, and uses the second sample group to generate a second prediction model for predicting a sales volume of a product.
Any method may be adopted for the learning unit 20 to generate the prediction model. The learning unit 20 may generate the prediction model by using a machine learning engine such as factorized asymptotic Bayesian inference (FAB).
The optimization unit 30 generates an objective function on the basis of an explained variable predicted by the generated prediction model and a constraint condition of the optimization. Then, the optimization unit 30 optimizes the generated objective function. For example, when two prediction models are generated, the optimization unit 30 generates a first objective function on the basis of an explained variable predicted by the First prediction model, and generates a second objective function on the basis of an explained variable predicted by the second prediction model. Then, the optimization unit 30 optimizes the generated first and second objective functions.
Note that any method may be adopted for the optimization unit 30 to perform the optimization process. For example, in a case of a problem for maximizing expected total sales, the optimization unit 30 generates, as an objective function, a sum of products of a sales volume predicted based on the prediction model and a product price based on the constraint condition exemplified in
Note that the optimization target may be a gross profit instead of total sales.
The evaluation unit 40 evaluates a result of optimization performed by the optimization unit 30. Specifically, the evaluation unit 40 specifies a sample group that has not been used for learning of the prediction model, in learning of the prediction model used for generating an objective function targeted for the optimization. Then, the evaluation unit 40 evaluates the optimization result by using the specified sample group.
For example, suppose that the optimization unit 30 generates a first objective function by using a first prediction model learned with use of a first sample group. At this time, the evaluation unit 40 evaluates an optimization result by using a second sample group. Similarly, suppose that the optimization unit 30 generates a second objective function by using a second prediction model learned with use of the second sample group. At this time, the evaluation unit 40 evaluates an optimization result by using the first sample group. For example, in a case of a price optimization problem, the evaluation unit 40 may evaluate an optimization result by calculating a profit on the basis of a specified price.
Further, the evaluation unit 40 may evaluate the optimization result by summing up the optimization result with each objective function. Specifically, the evaluation unit 40 may calculate, as the optimization result, an average of the optimization result with each objective function.
The output unit 50 outputs an optimized result. The output unit 50 may output the optimized result and the evaluation for the result. The output unit 50 may display the optimization result on a display device (not shown) or may store the optimization result in the storage unit 10.
The learning unit 20, the optimization unit 30, the evaluation unit 40, and the output unit 50 are realized by a processor (for example, a central processing unit (CPU), a graphics processing unit (GPU), and a field programmable gate array (FPGA)) of a computer that operates in accordance with a program (evaluation program).
The program described above may be stored in, for example, the storage unit 10, and the processor may read the program and operate as the learning unit 20, the optimization unit 30, the evaluation unit 40, and the output unit 50 in accordance with the program. Further, a function of the evaluation system may be provided in a software as a service (SaaS) format.
Each of the learning unit 20, the optimization unit 30, the evaluation unit 40, and the output unit 50 may be realized by dedicated hardware. In addition, part or all of each component of each device may be realized by general purpose or dedicated circuitry, a processor, or the like, or a combination thereof. These may be configured by a single chip or may be configured by a plurality of chips connected via a bus. Part or all of each component of each device may be realized by a combination of the above-described circuitry and the like and a program.
Further, when part or all of each component of the evaluation system is realized by a plurality of information processing apparatuses, circuits, and the like, the plurality of information processing apparatuses, circuits, and the like may be arranged concentratedly or distributedly. For example, the information processing apparatuses, the circuits, and the like may be realized as a form in which each is connected via a communication network, such as a client server system, a cloud computing system, and the like.
Next, an operation of the evaluation system of the present exemplary embodiment will be described.
The learning unit 20 generates a plurality of sample groups from samples to be used for learning (step S11). Then, the learning unit 20 generates a plurality of prediction models while inhibiting overlapping of a sample group to be used for learning among the generated sample groups (step S12). The optimization unit 30 generates an objective function on the basis of an explained variable predicted by the prediction model and a constraint condition of the optimization (step S13). Then, the optimization unit 30 optimizes the generated objective function (step S14). The evaluation unit 40 evaluates the optimization result by using a sample group that has not been used in learning of the prediction model (step S15).
As described above, in the present exemplary embodiment, the learning unit 20 generates a plurality of sample groups, and generates a plurality of prediction models while inhibiting overlapping of a sample group to be used for learning. Further, the optimization unit 30 generates and optimizes an objective function on the basis of an explained variable (prediction target) predicted by the prediction model and a constraint condition for the optimization. Then, the evaluation unit 40 evaluates the optimization result by using a sample group that has not been used in learning of the prediction model. Therefore, it is possible to perform evaluation while suppressing an optimistic bias in prediction optimization.
Hereinafter, a description is given to a reason why the estimation system of the present exemplary embodiment generates an unbiased estimator. In the context of algorithm investment, a method called a holdout method is known. The following description can also essentially be said as an extension of the holdout method for general prediction optimization problems.
One reason that the value f(z{circumflex over ( )}, θ{circumflex over ( )}) includes a bias is that z{circumflex over ( )} and θ{circumflex over ( )} are dependent random variables. In fact, when z{circumflex over ( )} and θ{circumflex over ( )} are independent, then a relationship of Equation 7 shown below is satisfied directly from the assumption that θ{circumflex over ( )} is an unbiased estimator of θ*.
[Formula 10]
x
[f({circumflex over (z)}, {circumflex over (θ)})]=x[f({circumflex over (z)}, θ*)] (Equation 7)
The main idea of a cross-validation method (as a standard cross-validation in machine learning) is to divide data x ∈ XN into two parts x1 ∈ XN1, x2 ∈ XN2, where N1+N2=N. Note that x1 and x2 are independent random variables since each element in x1 and x2 independently follows p. Hereinafter, an estimator based on x1 is referred to as θ1{circumflex over ( )}, and an estimator based on x2 is referred to as θ2{circumflex over ( )}.
Further, an optimal strategy based on each estimator is expressed by the following Equation 8.
[Formula 11]
{circumflex over (z)}
i:=arg maxz∈Z f(z, {circumflex over (θ)}i) (Equation 8)
At this time, z1{circumflex over ( )} and θ2{circumflex over ( )} are independent, and z2{circumflex over ( )}and θ1{circumflex over ( )} are also independent. Therefore, the following Equation 9 is satisfied.
[Formula 12]
x
[f({circumflex over (z)}1, {circumflex over (θ)}2)]=x
Furthermore, when N1 is large enough,
x
[f({circumflex over (z)}1, {circumflex over (θ)}*)] [Formula 13]
becomes closer to
x
[f({circumflex over (z)}, {circumflex over (θ)}*)] [Formula 14]
This idea can be extended to k-cross validation, which divides data x into K parts.
zk is calculated from {x1, . . . , xK}\{xk}, and θk{circumflex over ( )} is calculated from xk. At this time, a value CVk shown in the following Equation 10 satisfies the following Equation 11. z˜ in Equation 11 represents a strategy calculated from (K−1) N′ pieces of sample.
[Formula 16]
{tilde over (z)}k ∈arg maxz ∈Z f(z, {tilde over (θ)}k) (Equation 12)
Then, the evaluation unit 40 evaluates the optimization result by calculating Equation 13 shown below (step S24), and the output unit 50 outputs the evaluation result (step S25).
That is, Equation 14 shown below is calculated by the algorithm exemplified in
Next, an outline of the present invention will be described.
Such a configuration makes it possible to perform an evaluation while suppressing an optimistic bias in prediction optimization.
Further, the optimization unit 82 may generate an objective function on the basis of each generated prediction model, and optimize each generated objective function. Then, the evaluation unit 83 may evaluate the optimization result by summing up the optimization result with each objective function.
Specifically, the evaluation unit 83 may calculate, as the optimization result, an average of the optimization result with each objective function.
Further, the learning unit 81 may generate two sample groups from samples to be used for learning, generate a first prediction model by using a first sample group, and generate a second prediction model by using a second sample group. Further, the optimization unit 82 may generate a first objective function on the basis of an explained variable predicted by the first prediction model, generate a second objective function on the basis of an explained variable predicted by the second prediction model, and optimize the generated first and second objective functions. Then, the evaluation unit 83 may evaluate the optimization result of the first objective function by using the second sample group, and evaluate the optimization result of the second objective function by using the first sample group.
Specifically, the learning unit 81 may generate a plurality of prediction models for predicting a sales volume of a product. Further, the optimization unit 82 may generate an objective function for calculating sales on the basis of a sales volume based on the prediction model and a selling price of the product, and specify a price of a product for maximizing total sales by optimizing the generated objective function. Then, the evaluation unit 83 may evaluate the optimization result by calculating a profit on the basis of the specified price.
At that time, the optimization unit 82 may generate an objective function using a possible selling price of each product as a constraint condition.
The evaluation system described above is implemented in the computer 1000. Then, an operation of each processing unit described above is stored in the auxiliary storage device 1003 in a form of a program (evaluation program). The processor 1001 reads out the program from the auxiliary storage device 1003, develops the program in the main storage device 1002, and executes the above processing in accordance with the program.
Note that, in at least one exemplary embodiment, the auxiliary storage device 1003 is an example of a non-transitory tangible medium. Other examples of the non-transitory tangible medium include a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, and the like, connected via the interface 1004. Further, in a case where this program is distributed to the computer 1000 by a communication line, the computer 1000 that has received the distribution may develop the program in the main storage device 1002 and execute the above processing.
Further, the program may be for realizing some of the functions described above. Further, the program may be a program that realizes the above-described functions in combination with another program already stored in the auxiliary storage device 1003, that is, a so-called difference file (difference program).
Although the present invention has been described with reference to the exemplary embodiment and examples, the present invention is not limited to the above exemplary embodiment and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims priority based on U.S. Provisional Application No. 62/580,672, filed on Nov. 2, 2017, the entire disclosure of which is incorporated herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/030543 | 8/17/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62580672 | Nov 2017 | US |