The embodiment relates to a parameter selection method, a parameter selection program, and an information processing device.
In machine learning, parameters are set, for example, by selecting a parameter in a grid pattern for training data, learning is performed with each selected parameter, and a predicted value obtained by learning is evaluated to specify a parameter that can obtain the most appropriate evaluation value and to perform optimization.
Related art is disclosed in Japanese Laid-open Patent Publication No. 8-272761, Japanese Laid-open Patent Publication No. 5-265512 and Japanese Laid-open Patent Publication No. 10-301975.
According to an aspect of the embodiments, a parameter selection method includes processing, performed by a computer, of: calculating a response surface that predicts an evaluation value, from evaluation values obtained from training data and sets of parameter values, which are stored in a memory; working out, from each of maximum evaluation values among the obtained evaluation values, shortest distances to a contour line defined at a position equal to or smaller than the maximum evaluation values on the calculated response surface; and specifying a set of parameter values farthest from the contour line, from among the shortest distances worked out for each of the maximum evaluation values.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
There is a technique of, for example, working out sensitivity for each parameter with respect to an evaluation function, and preferentially tuning a highly sensitive parameter to significantly reduce the number of times of convergence.
However, even when a parameter that can obtain the optimum evaluation value with the training data is selected, the evaluation in the real data is sometimes low. There are cases where the tendency differs between the training data and the real data.
Therefore, the determination accuracy may be improved.
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In machine learning, learning is performed using training data, and the prediction result of the learning is evaluated using evaluation data. Furthermore, tuning is performed such that the correct answer can be derived, using data for which the correct answer is known in advance. However, for example, since the data for which the correct answer is known is often past data, or only a part of the data is available, the tuned machine learning does not always give the correct answer in actual operation.
The main causes of this are deemed to be a difference in tendency between the training data and the real data, the influence of noise, and the like. Furthermore, the number of pieces of data differs between the training data and the real data, and it is sometimes difficult for the training data to learn all the tendencies of the real data.
For this reason, it is necessary to perform tuning with high versatility as much as possible, and there are existing techniques of increasing versatility under general conditions; however, for example, when the evaluation values are quantized or because of the influence of noise, there are cases where a plurality of tuning results that finds the most favorable evaluation is given, for example. In such cases, it is difficult to select a parameter that finds a good evaluation value for the real data.
The present embodiments provide a parameter selection method, a parameter selection program, and an information processing device that, when a plurality of tuning results that finds the most favorable evaluation is given, select a tuning result that is deemed to have the highest versatility, from among the plurality of results.
That is, a distance to a boundary that is a boundary of a parameter space that includes a best evaluation point and is closest to each set of parameter values is worked out, and a set of parameter values farthest from the boundary is acquired. Each best evaluation point indicates the highest value in evaluation and a set of parameter values at that time.
Meanwhile, since the scale differs depending on the parameter, the parameter value is normalized. Furthermore, since there are parameters that work for the evaluation value and parameters that do not work for the evaluation value, it is desirable to decrease parameters that do not work for the evaluation value, using a threshold value or the like, and to perform evaluation using only parameters that contribute to the evaluation value to a certain extent or more.
The boundary of the parameter space that includes the best evaluation point is where a “change point” is located at which a slight difference in one or more parameter values in a set of parameters greatly affect the evaluation value. The parameter space that includes the best evaluation points is called a “best parameter space”.
In addition, the boundary of the best parameter space represents a shape. The fact that the evaluation result deteriorates if at least one parameter value at the boundary is slightly different is deemed to indicate a point where the evaluation result is differentiated (immediately before the evaluation value deteriorates) even if the evaluated data is slightly different. Therefore, this point corresponds to a set of parameter values that has a risk of having influence when used in real data.
The present embodiment is as follow.
(1) A point where a good evaluation value can be stably obtained is quantified.
(2) A predicted surface of the solution is synthesized from the evaluation result. As an example, a response surface is generated.
(3) A contour line is generated at a height defined for the generated response surface.
(4) A shortest distance between each best evaluation point and the contour line is worked out.
(5) Among the worked-out shortest distances, a best evaluation point with the largest distance from the closest contour line is determined to be an evaluation point farthest from the boundary. In order to obtain a set of parameters that stably finds a good evaluation value, a slope outside the boundary may also be evaluated, and the shortest distance may be evaluated by raising the score using the evaluation result for the slope.
In the above (1) to (5), the reason for performing the above (4) will be described. First, the following will be examined.
<Method A>
This is a method that adopts a point closest to the center of the best parameter space.
<Method B>
This is a method that adopts a set of parameters that first obtains the best evaluation point, a set of parameters that finally obtains the best evaluation point, or a set of parameters that is randomly selected.
<Method C>
This is a method that adopts a set of parameters by analyzing the sensitivity of the parameters.
Among the six points in the best parameter space 4sp, a point 4a corresponds to the point closest to the center and is the best evaluation point selected by the method A. On the other hand, in the present embodiment, a point 4b having the largest distance from the contour line 3a is selected as the best evaluation point. In this manner, the point 4a dose to the center selected by the method A is not always far from the boundary (contour line 3a). The method A sometimes fails to specify the point 4b that more stably finds a good evaluation value.
Next, as for the method B, any one of the six points in the best parameter space 4sp is selected in the method B. That is, a point closer to the boundary of the contour line 3a than the point 4a selected by the method A is likely to be selected. The method B has a higher possibility than the method A of failing to specify the point 4b that stably finds a good evaluation value.
Then, as for the method C, a set of parameters is selected by analyzing the sensitivity of the parameters in the method C. Conventionally, a set of parameters has been selected by the method C, but when there is an interaction between parameters, it is not sufficient to analyze the sensitivity of each parameter individually. The sensitivity of a parameter corresponds to the degree of contribution to the evaluation.
In the present embodiment, the influence of a plurality of parameters can be considered at the same time by selecting the point 4b that stably finds the best evaluation point, based on the distance to the boundary (contour line 3a) of the best parameter space 4sp. Here, the distance to be computed in the present embodiment will be described.
In the present embodiment, for each point 4w in the best parameter space 4sp, the distance from the point 4w to the contour line 3a is computed by working out a perpendicular line to a tangent line to the contour line 3a, and a short distance among the computed distances is employed as the distance of the point 4w. In this example, distances D1, D2, D3 and D4 are worked out. The distance D1 that is shortest among these distances D1 to D4 is employed as the distance of the point 4w. The distances indicated by the broken lines in
Since most of the existing optimization issues have relatively homogeneous parameters, the shape of the solution is a simple shape such as a circle. Therefore, the point dose to the median point has been sufficient. However, in machine learning, the features differ depending on the parameters, and the solution has a complicated shape. In the present embodiment, in such machine learning, by quantifying the stability of the best evaluation point by the shortest distance to the contour line 3a, one best evaluation point can be suitably selected from among a plurality of best evaluation points.
An information processing device that implements parameter selection processing of the present embodiment as described above has a hardware configuration as illustrated in
In
The CPU 11 corresponds to a processor that controls the information processing device 100 in accordance with a program stored in the main storage device 12. A random access memory (RAM), a read only memory (ROM), and the like are used for the main storage device 12, and a program executed by the CPU 11, data necessary for the processing by the CPU 11, data obtained by the processing by the CPU 11, and the like are stored or temporarily saved in the main storage device 12.
A hard disk drive (HDD) or the like is used for the auxiliary storage device 13, and stores data such as programs for executing various types of processing. Various types of processing are implemented by loading a part of the programs stored in the auxiliary storage device 13 into the main storage device 12 and executing the loaded part of the programs in the CPU 11. The main storage device 12, the auxiliary storage device 13, and an external storage device and the like that can be accessed by the information processing device 100 are collectively referred to as a storage unit 130.
The input device 14 is used by a user to input various types of information necessary for the processing by the information processing device 100. The display device 15 displays various types of necessary information under the control of the CPU 11. The input device 14 and the display device 15 may be a user interface implemented by an integrated touch panel or the like. The communication I/F 17 performs communication through a network, for example, by wire or wirelessly. The communication by the communication I/F 17 is not limited to wireless or wired communication.
The drive device 18 interfaces a storage medium 19 (for example, a compact disc read-only memory (CD-ROM)) set in the drive device 18 with the information processing device 100.
The program that implements the processing performed by the information processing device 100 is provided to the information processing device 100, for example, via the drive device 18 by the storage medium 19 such as a CD-ROM. Note that the storage medium 19 that stores the program is not limited to the CD-ROM, and only needs to be one or more non-transitory and tangible media having a computer-readable structure. Besides the CD-ROM, the computer-readable storage medium may be a digital versatile disk (DVD) disk, a portable recording medium such as a universal serial bus (USB) memory, or a semiconductor memory such as a flash memory.
An exemplary functional configuration of a machine learning unit to which the present embodiment is applied will be described.
Referring to
The prediction unit 71 receives an input of the input data 51, and executes prediction processing a to predict a target event. A parameter Pa is set in the prediction processing a. The input data 51 corresponds to training data (also referred to as training data). The parameter Pa indicates the parameter value for the prediction processing a. The output data 53 is output to the storage unit 130 by the prediction processing a, and the output data 53 indicates the prediction result. For simplicity, the prediction unit 71 will be described as performing only the prediction processing a, but may execute two or more types of processing.
The evaluation unit 73 evaluates the accuracy of the output data 53. The evaluation unit 73 accumulates the evaluation result 55 in the storage unit 130. The evaluation result 55 indicates a set of parameter values and an evaluation value. The accuracy of the output data 53 is performed by calculating a prediction error. As an example, a root mean squared error (RMSE) can be used.
The selection unit 80 selects the best parameter value 97 using a table in which the evaluation results 55 are accumulated (“evaluation result accumulation table 57” described later). The selection unit 80 sets the best parameter value 97 as the parameter Pa. The processing performed by the prediction unit 71, the evaluation unit 73, and the selection unit 80 is repeated with the newly set parameter Pa. When the difference between the previous parameter Pa and the current parameter Pa is equal to or less than a predefined determination value for determining whether or not convergence has been obtained, the current parameter Pa is adopted as an optimized Pa when prediction is actually made.
A case where such a machine learning unit 200 of the present embodiment predicts the amount of electric power from meteorological data will be described as an application example.
The evaluation unit 73 evaluates the accuracy of the output data 53 with reference to actual data 54 of the amount of electric power, and outputs the evaluation result 55. The actual data 54 of the amount of electric power corresponds to teacher data. The selection unit 80 according to the present embodiment obtains the best parameter value 97 using the accumulation table of the evaluation results 55. The best parameter value 97 is set as the parameter Pa of the prediction processing a.
The prediction unit 71 sets the updated parameter Pa in the prediction processing a, and predicts the amount of electric power from the input data 51 of the meteorological data to update the output data 53. The evaluation unit 73 evaluates the accuracy of the output data 53 based on the actual amount of electric power, and outputs the evaluation result 55. The selection unit 80 according to the present embodiment obtains the best parameter value 97 using the accumulation table of the evaluation results 55. The best parameter value 97 is set as the parameter Pa.
Preferably, when the best parameter value 97 currently obtained is substantially the same as the previous best parameter value 97, the current best parameter value 97 is stored in the storage unit 130 as the optimized Pa, and used for actual prediction of the amount of electric power.
The process illustrated in
The meteorological data as the input data 51 exemplified in
The date and time indicates the day of the month and year and the time of measurement. In this example, the values of the temperature (° C.), precipitation (mm), sunshine duration (hours), wind speed (m/s), wind direction, local pressure (hPa), relative humidity (%), snowfall (cm), and the like measured hourly on Jan. 1, 2017 are recorded.
As an example, in the date and time “2017/1/11:00”, the temperature (° C.) “5.1”, the precipitation “O” mm, the sunshine duration “0” hours, the wind speed “3.5” m/s, the wind direction “west-northwest”, the local pressure “1019.8” hPa, the relative humidity “73”%, and the snowfall “0” cm are recorded.
With respect to such meteorological data, the predicted value of the amount of electric power for each date and time entry is output from the prediction unit 71 as the output data 53. Then, the accuracy of the output data 53 is evaluated using the actual data 54 of the amount of electric power as illustrated in
The evaluation unit 73 evaluates the accuracy of the output data 53 based on the predicted value of the amount of electric power for each date and time entry indicated by the output data 53, and the actual value of the actual data 54 of the amount of electric power, and outputs the evaluation result 55. The evaluation result 55 is accumulated in the storage unit 130 every time the evaluation unit 73 runs, and is used by the selection unit 80 as the evaluation result accumulation table 57.
The response surface generation unit 81 performs response surface generation processing of generating a response surface using the evaluation result accumulation table 57. The response surface information 91 is output to the storage unit 130 by the response surface generation unit 81. The contour line generation unit 83 performs contour line generation processing of generating a contour line for the generated response surface. The contour line information 93 is output to the storage unit 130 by the contour line generation unit 83.
The best evaluation point selection unit 85 performs best evaluation point selection processing of selecting the best evaluation point from among all the evaluation points. In the present embodiment, a plurality of best evaluation points having the same evaluation value may be selected. The best evaluation point information 95 is output to the storage unit 130 by the best evaluation point selection unit 85.
The farthest point specifying unit 87 performs farthest point specifying processing of referring to the contour line information 93 and the best evaluation point information 95 to specify a contour line of the best parameter space 4sp that includes a plurality of best evaluation points, and specifying a farthest best evaluation point in distance to the specified contour line, from among the plurality of best evaluation points. The farthest point specifying unit 87 computes the distance to the contour line for each of the plurality of best evaluation points, and specifies the best parameter value 97 having the longest distance. The best parameter value 97 is stored in the storage unit 130 as the parameter Pa.
The evaluation result accumulation table 57 is a data table in which the already obtained evaluation results 55 are accumulated, where the evaluation point and the used parameters Pa (a set of parameters P1, P2, P3, . . . , and Pa) are treated as one record, and has the number of records equal to the number of evaluations.
The response surface information 91 indicates a response surface function that has been obtained, the parameter value of the surface function, and the like. The contour line information 93 indicates the height of the contour line to be worked out, and the response surface information. The height of the contour line to be worked out is designated by a value obtained by multiplying the highest value in evaluation by a defined ratio. As an example, a value obtained by multiplying the highest value by 0.9 is indicated.
The best evaluation point information 95 corresponds to a table that indicates a set of parameter values that has obtained the highest value among the evaluation points, and the obtained highest value. The selection information 99 is information that indicates at least the best parameter value 97. The selection information 99 may further indicate the highest value in evaluation.
The response surface generation unit 81 generates a response surface in an n-dimensional space defined by the evaluation values and the parameter values indicated in the evaluation result accumulation table 57 (step S302). In generating the response surface, the shape of the solution can be expressed in n dimensions by using a response surface method or the like that works out an approximate function that passes near a known point, and predicts the shape of the solution to obtain an optimum solution. The response surface information 91 is output to the storage unit 130. Then, a contour line is created for the response surface that represents the obtained shape of the solution.
The contour line generation unit 83 generates a contour line at a position lower than the best evaluation point by a defined value (step S303). The contour line generation unit 83 works out the height of the contour line on the response surface represented by the response surface information 91 using a preset value (for example, 0.9), and generates the contour line at the worked-out height.
Meanwhile, the best evaluation point selection unit 85 acquires, from among the evaluation values in the evaluation result accumulation table 57, the highest value and a set of parameter values that has obtained the highest value, as the best evaluation point (step S304). A plurality of best evaluation points may be acquired. The best evaluation point information 95 is stored in the storage unit 130.
Then, the farthest point specifying unit 87 computes the shortest distance to the contour line for each of the best evaluation points indicated by the best evaluation point information 95 (step S305). Next, the farthest point specifying unit 87 specifies the best evaluation point with the largest obtained shortest distance, and acquires a parameter value from the specified best evaluation point to set the acquired parameter value as the best parameter value 97 (step S306). The selection information 99 that indicates the best parameter value 97 is output to the storage unit 130. The best evaluation point that indicates the best parameter value 97 and the highest value may be set in the selection information 99. Then, the selection processing by the selection unit 80 ends.
Next, exemplary selection processing will be described with reference to
In the response surface processing by the response surface generation unit 81, when the number of records in the evaluation result accumulation table 57 does not reach a predefined number of samples, the response surface 6 can be generated using all the records. When the evaluation result accumulation table 57 contains the number of records equal to or greater than the predefined number of samples, a number of records equal to the number of samples can be extracted in descending order of evaluation value. Then, the contour line generation processing by the contour line generation unit 83 defines the contour line 3a on the generated response surface 6.
Once the best evaluation point with the highest value “3” in evaluation is obtained by the best evaluation point selection unit 85, the farthest point specifying processing by the farthest point specifying unit 87 is performed. At the best evaluation point in this example, the values of the parameters P1 and P2 that have obtained the highest value “3” in evaluation are also indicated.
In this example, it is indicated for the best evaluation point MP_a that the value of the parameter P1 is “0.0”, the value of the parameter P2 is “0.1”, the evaluation value is “3.0”, and the distance is “0.10”. It is indicated for the best evaluation point MP_b that the value of the parameter P1 is “0.85”, the value of the parameter P2 is “0.85”, the evaluation value is “3.0”, and the distance is “0.011”.
Only the values of the distance will be indicated below. The distance of the best evaluation point MP_c is “0.070”, the distance of the best evaluation point MP_d is “0.050”, the distance of the best evaluation point MP_e is “0.038”, and the distance of the best evaluation point MP_f is “0.067”.
In this manner, the shortest distance to the boundary of the defined contour line 3a is worked out for each of the best evaluation points, and a best evaluation point that is farthest from the boundary is selected from among the worked-out shortest distances; consequently, a set of parameter values that more stably finds a good evaluation value can be selected. The determination accuracy when one best evaluation point is selected from among a plurality of best evaluation points can be improved.
In the above description, the response surface generation unit 81 corresponds to an example of a response surface calculation unit, and the farthest point specifying unit 87 corresponds to an example of a specifying unit.
The present invention is not limited to the embodiments specifically disclosed above, and primary modifications and changes can be made without departing from the scope of the claims.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2018/019661 filed on May 22, 2018 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2018/019661 | May 2018 | US |
Child | 17098950 | US |