This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2018-174292, filed on Sep. 18, 2018, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a non-transitory computer-readable recording medium, a prediction method, and a learning device.
In machine learning, when discretizable data is to be handled, a learning model based on a decision tree is used in terms of readability and interpretability. When a single decision tree is used, a learning model different from an actual phenomenon is created due to lack of learning data and inaccuracy thereof, a parameter, or the like at the time of building the learning model, and a possibility that the accuracy at the time of application becomes extremely low cannot be excluded.
In recent years, a random forest in which data and characteristics to be used for learning are changed at random to create a plurality of decision trees and decision making is performed based on the principle of majority rule. In the random forest, even if a tree that extremely decreases the accuracy at the time of application is included, a certain degree of accuracy can be ensured as a whole.
In application of the machine learning, a stable learning model in which there are some grounds for a result is needed in some cases, for example, when it is desired to examine whether the result is reasonable or whether there is another possibility, or when it is desired to compare the result with that of a learning model in the past.
However, since the random forest includes randomness in creation of a learning model, a decision tree to be created may be biased. Further, when the learning data is few or inaccurate, it is difficult to know how much influence the randomness to be used in creation of the learning model by the random forest has on the output accuracy of the learning model. Therefore, an evidence that no bias has occurred cannot be provided, and thus the reliability of the prediction result by the machine learning is not always high. Further, to reduce the influence of randomness, it can be considered to increase the number of decision trees created at random. However, learning hours and a prediction time by a learned learning model become long, which is not realistic.
According to an aspect of an embodiment, a non-transitory computer-readable recording medium stores therein a program that causes a computer to execute a process. The process includes creating a plurality of decision trees, using pieces of training data respectively including an explanatory variable and an objective variable, which are configured by a combination of the explanatory variables and respectively estimate the objective variable based on true or false of the explanatory variables; creating a linear model that is equivalent to the plurality of decision trees and lists all terms configured by a combination of the explanatory variables without omission; and outputting a prediction result by using the linear model from input data.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Preferred embodiments will be explained with reference to accompanying drawings. The present invention is not limited to the embodiments. Further, the respective embodiments can be combined with each other appropriately in a range without causing any contradiction.
Overall Configuration
In a general random forest, since model creation includes randomness, a decision tree to be created may be biased. Therefore, it is not possible to exclude a possibility that the overall accuracy becomes extremely low due to the bias, thereby causing an event in which results may change each time. On the other hand, if the number of decision trees created at random is increased, the possibility of bias can be reduced; however, time is needed for learning and application thereof. That is, the decrease of accuracy and high-speed processing time are in a trade-off relation with each other, and a learning model that can perform processing at a high speed with stable accuracy has been desired.
Therefore, the learning device 10 according to the first embodiment decisively creates all decision trees that can be created from pieces of training data, and creates a forest model that performs decision making based on the principle of majority rule to create a stable model. The learning device 10 then converts the forest model to a linear model that returns an equivalent result, to reduce a calculation time at the time of application.
Specifically, as illustrated in
Specifically, the learning device 10 resolves the respective decision trees included in the forest model into paths to build a linear model in which the presence of a path is set as a variable, and decides coefficients of the respective variables in the linear model based on the number of decision trees including the path, which are included in the forest model. Thereafter, the learning device 10 inputs data to be predicted into the decided linear model and performs calculation based on the linear model to output a prediction result.
In this manner, the learning device 10 increases the number of decision trees to improve stability at the time of learning, and can perform prediction without using the decision trees at the time of prediction (application). Therefore, the learning device 10 can reduce the processing time for acquiring stable prediction.
Functional Configuration
The communication unit 11 is a processing unit that controls communication with other devices, and for example, is a communication interface. For example, the communication unit 11 receives training data to be learned, a learning start instruction, data to be predicted, a prediction start instruction, and the like from a manager's terminal or the like, and transmits a training result, a prediction result, and the like to the manager's terminal.
The storage unit 12 is an example of a storage device that stores therein data, a program executed by the control unit 20, and the like, and for example, is a memory or a hard disk. The storage unit 12 stores therein a training data DB 13, a learning result DB 14, a prediction target DB 15, and the like.
The training data DB 13 is a database that stores therein training data, which is data to be learned. Specifically, the training data DB 13 stores therein a plurality of pieces of training data respectively including an explanatory variable and an objective variable. The training data stored herein is stored and updated by a manager or the like.
In the example of
As an example of a certain event, there is an example of determining “whether a risk of heatstroke is high”, in which “1” is set to the label Y of the training data indicating that the risk of heatstroke is high, and “0” is set to the label Y of the training data indicating others. In this case, as the explanatory variables such as the variable A, “whether the temperature is 30° C. or higher” can be presented, in which, when the temperature corresponds to 30° C. or higher, the variable A is determined as “1”, and when the temperature does not correspond to 30° C. or higher, the variable A is determined as “0”.
The learning result DB 14 is a database that stores a learning result by the control unit 20 described later. Specifically, the learning result DB 14 stores a linear model that is a learned learning model and is to be used for prediction.
The prediction target DB 15 is a database that stores data to be predicted. Specifically, the prediction target DB 15 stores data to be predicted that includes a plurality of explanatory variables and is data to be input to the linear model, which is a learned learning model.
The control unit 20 is a processing unit that controls the entire learning device 10, and is, for example, a processor. The control unit 20 includes a learning unit 21, a creation unit 22, and a prediction unit 23. The learning unit 21, the creation unit 22, and the prediction unit 23. The learning unit 21, the creation unit 22, and the prediction unit 23 are examples of an electronic circuit provided in the processor, or a process performed by the processor.
The learning unit 21 is a processing unit that creates a forest model including a plurality of decision trees based on training data. Specifically, the learning unit 21 reads respective pieces of training data stored in the training data DB 13, and decisively creates all the decision trees that can be created from the pieces of training data. The learning unit 21 creates a forest model that determines an output result of each of the created decision trees based on the principle of majority rule, and outputs the forest model to the creation unit 22.
For example, the learning unit 21 creates a plurality of decision trees, for each of the pieces of training data, which are configured by a combination of explanatory variables and presume (estimate) an objective variable by true or false of the explanatory variable, respectively. The learning unit 21 can create a decision tree by dividing data expressed by a set of a variable and a value thereof into some subsets, for each of the pieces of training data. As a method of creating a decision tree, various known methods can be employed.
The creation unit 22 is a processing unit that crates a linear model by using the plurality of decision trees created by the learning unit 21. Specifically, the creation unit 22 creates a linear model that is equivalent to the plurality of decision trees and lists all the terms formed by the combination of explanatory variables without omission. The creation unit 22 stores the linear model in the learning result DB 14 as a learned learning model.
For example, the creation unit 22 resolves the respective decision trees included in the forest model into paths to build a linear model (a partial linear model) in which the presence of a path is set as a variable, and decides coefficients of respective variables in the linear model based on the number of decision trees including the path, which are included in the forest model.
Conversion from the decision trees to a linear model is specifically explained here.
For such a decision tree, the creation unit 22 creates a cube that is a product term of literals of an objective variable (see
This point is specifically explained with reference to
The creation unit 22 uses “the sum of cubes corresponding to a path with a leaf being true (+) or “1−(the sum of cubes corresponding to a path with a leaf being false (−))” to create a linear model equivalent to the decision tree (see
Here, a product of literals appearing in a path from a root to a leaf of a decision tree is referred to as a product term (cube) representing the leaf. A formula in which the sum of cubes representing all the positive leaves of a decision tree or the sum of cubes representing all the negative leaves of the decision tree is subtracted from 1 represents a formula of a linear model equivalent to the decision tree. When a forest model formed by one or more decision trees is provided, all the decision trees included in the forest model are converted to a formula of a linear model by any method described above. These formulas are added, and the total is then divided by the number of decision trees included in the forest model, which represents a formula of a linear model equivalent to the forest model.
In the example of
Next, the creation unit 22 calculates the sum of partial linear models created from the respective decision trees based on the “sum of cubes corresponding to paths with the leaf being “true (+)”, and divides the calculated “sum of partial linear models” by the total number of decision trees, to convert the decision tree to a linear model returning a result equivalent to the forest model.
Subsequently, the creation unit 22 calculates the sum of these partial linear models “(AB−+A−D)+(A+A−D+A−D−C)+(A−C)”=“AB−+2A−D+A+A−D−C+A−C”. Thereafter, the creation unit 22 creates “(AB−+2A−D+A+A−D−C+A−C)/3, which is obtained by dividing the sum of the partial linear models by the number of decision trees, which is 3. That is, the creation unit 22 creates “Y=(AB−+2A−D+A+A−D−C+A−C)/3” as a learned learning model (linear model).
Returning to
The prediction unit 23 assigns a value by assigning variables included in the data to be predicted to the linear model, thereby enabling to acquire a score from 0 to 1 inclusive. When the score exceeds 0.5, the objective variable in the data to be predicted can be estimated as 1, and in other cases, the objective variable in the data to be predicted can be estimated as 0.
For example, the prediction unit 23 assigns “AB−=1”, when a variable A and a variable B in the data to be predicted is “1, 0”. In this manner, the prediction unit 23 calculates a value at the time of inputting the data to be predicted to a linear model, and when the calculated value is equal to or larger than “0.5”, the prediction unit 23 predicts the data as “true:+” corresponding to a certain event, and when the calculated value is smaller than “0.5”, the prediction unit 23 predicts the data as “false:−” not corresponding to the certain event. The prediction unit 23 displays a prediction result on a display or transmits the prediction result to the manager's terminal.
Processing Flow
When the learning ends (YES at S104), the creation unit 22 creates a linear model (a partial linear model) corresponding to each decision tree included in a created forest model (S105). Subsequently, the creation unit 22 creates a linear model equivalent to the forest model by using the partial linear models corresponding to the respective decision trees and the number of decision trees included in the forest model (S106).
Thereafter, when prediction start is instructed (YES at S107), the prediction unit 23 reads data to be predicted from the prediction target DB 15 (S108), and inputs the read data to be predicted to the linear model created at S106 to acquire a prediction result (S109). When there is remaining data to be predicted (NO at S110), the prediction unit 23 repeats steps at S108 and thereafter. When prediction is to be ended (YES at S110), the prediction unit 23 ends the processing.
In
As described above, the learning device 10 can create a partial linear model corresponding to each decision tree that can be created from training data, and can create a linear model corresponding to a forest model by using the sum of respective partial linear models. Therefore, the learning device 10 can build a learning model that acquires stable prediction. Further, since the learning device 10 can perform prediction with respect to the data to be predicted by one linear model, without performing determination by a decision tree and determination by the principle of majority rule, the processing time can be reduced.
Further, since the creation unit 22 can suppress bias due to a decision tree by decisively creating all the decision trees that can be created from pieces of training data, the learning device 10 can suppress accuracy deterioration due to the bias. Further, since the learning device 10 creates a linear model of a decision tree corresponding to each piece of training data, the learning device 10 can indicate that there is no bias in the decision tree. Further, since the learning device 10 can suppress bias, the learning device 10 can acquire the same result at all times from the same data, and even if the number of decision trees increases, the learning device 10 can perform learning and application at a high speed. That is, the learning device 10 can overcome the defect of the random forest.
In the first embodiment, there has been described an example in which a linear model is created after a forest model including a plurality of decision trees is once created. However, the method of creating a linear model is not limited thereto. For example, when the size of the forest model is sufficiently large as compared with the number of cubes, it may be easier to count the number of decision trees including respective cubes with respective to the respective cubes than converting a decision tree one by one and totalizing the conversion results. Therefore, in the second embodiment, when it is assumed that a forest model has been built, the number of decision trees including the path thereof is counted, and a coefficient of a linear model is created without actually building the forest model, thereby reducing a calculation time at the time of learning.
Description of Learning Device
Specifically, the learning device 10 creates a cube C1 to a cube Ci (i is a natural number) from the training data. The learning device 10 then studies a decision tree T1 to a decision tree Tn (n is a natural number) including a cube and counts the number thereof with respect to each cube. In this manner, the learning device 10 specifies the number of respective cubes that can be created from the pieces of training data and the number of decision trees that can be considered to include a cube, and creates a linear model by use thereof. For example, when it is assumed that the number of trees including a cube C in a forest model F is “#F(C)”, a linear model Y can be calculated as illustrated in Expression (1).
Y=E #
F(Ci)/N (1)
A simple decision tree is described here as an example.
In this case, in an example of a subset of n pieces of data, assuming the number of variables as m, a forest model including about 2n+m decision trees can be created with respect to one subset. For example, in a case of n=200 and m=20, a forest model including about 2220 decision trees can be acquired. Therefore, according to a method based on the principle of majority rule, a very large amount of time is needed, and thus the method is not realistic. In this manner, when the number of decision trees included in the forest model is very large, the method described in the second embodiment is particularly useful.
The learning unit 21 then obtains all the products of a combination number C(a, i) of selecting i pieces from a, and a combination number C(b, j) of selecting j pieces from b, with respect to an integer i from 0 to a inclusive and an integer j from 0 to b inclusive. When assuming that a rate of positive examples in the training data is “r=number of positive examples/n”, the learning unit 21 respectively obtains totals # (P) and # (N) regarding i and j satisfying i/(i+j)>r and i/(i+j)<r, of the number of combinations described above.
In this manner, the learning unit 21 obtains a coefficient “# (P)−# (N)” and a constant “# (N)” of the cube Q and the total number of trees “# (P)+# (N)” regarding the cube Q. By totalizing the constants and the coefficients and dividing the total thereof by the sum of the total number of decision trees with respect to all the cubes, the learning unit 21 can acquire a linear model that returns a result equivalent to that of a forest model using a simple decision tree.
When the case is described with reference to
In a case of i/(i+j)=0.5=r from the table illustrated in
For example, the learning unit 21 counts the sum of the number of simple decision trees having a branch with “true (+)” as “# (P)”, which corresponds to upper right of the diagonal line in the table. Further, the learning unit 21 counts the sum of the number of simple decision trees having a branch with “false (−)” as “# (N)”, which corresponds to lower left of the diagonal line in the table. The creation unit 22 then calculates “# (P)Q+# (N) (1−Q)=# (N)+(# (P)−# (N))Q” as a linear model (a partial linear model) regarding the cube corresponding to
A specific example is explained with reference to
From (b) in
Subsequently, the learning unit 21 creates a table illustrated in (c) in
That is, the learning unit 21 calculates the total number of decision trees “# (P)” having the path to the leaf in the positive example as “1”, and the total number of decision trees “# (N)” having the path to the leaf in the negative example as “3+3+1+3+1=11”. As a result, the learning unit 21 can create “# (P)Q+# (N) (1−Q)=1Q+11(1−Q)=11−10Q” as a linear model corresponding to the cube Q.
Thereafter, the learning unit 21 performs creation of a linear model with respect to each cube included the table illustrated in (b) in
In this case, the creation unit 22 creates “Y=sum of respective linear models (X+Y+Z+G+H)/sum of total number of decision trees including the respective linear models (x+y+z+g+h)”, as a final linear model (learning model). As a result, the learning device 10 can create a linear model, which is a learned learning model, without actually building a forest model and can reduce the calculation time at the time of learning.
Processing Flow
Subsequently, the learning device 10 specifies a literal in the training data to extract a plurality of cubes (S203). The learning device 10 then selects one cube (S204), creates a linear model corresponding to the cube, and counts the number of decision trees including the linear model (S205).
When there is a non-processed cube (NO at S206), processes at S204 and thereafter are repeated. On the other hand, when processing is completed regarding all the cubes (YES at S206), the learning device 10 calculates the sum of linear models corresponding to the respective cubes (S207), and adds the number of decision trees including the respective cubes (S208).
The learning device 10 then creates a linear model equivalent to a forest model by dividing the sum of linear models by the number of decision trees (S209). Since the prediction processing is the same as that of the first embodiment, detailed descriptions thereof are omitted.
As described above, the learning device 10 can calculate a forest model including about 2n+m trees based on the principle of majority rule by counting about 2mn2 times and can increase the processing speed.
In the method of (1), the number of # (P) becomes 0, and in the method of (2), the number of # (N) becomes 0, and thus it cannot be suppressed that an extreme result is obtained by sampling. In contrast, according to the method of the second embodiment, all the decision trees can be studied without omission. Therefore, # (P) and # (N) can be also studied, thereby enabling to suppress creation of a learning model biased to an extreme result.
While embodiments of the present invention have been described above, the present invention can be implemented in various different modes other than the embodiments described above.
Limitation of Path
For example, a path to be used for a linear model can be limited. Specifically, the path is limited to the one having the number of literals of the explanatory variable equal to or smaller than a predetermined value. One literal corresponds to a fact that a value of a certain explanatory variable is 1 (positive: true) or 0 (negative: false). Therefore, a condition that the number of literals is equal to or smaller than a predetermined value limits the number of relevant explanatory variables to be equal to or smaller than a predetermined number, thereby enhancing generality of learning.
System
The information including the processing procedure, the control procedure, specific names, and various kinds of data and parameters illustrated in the above description or in the drawings can be optionally changed, unless otherwise specified.
The respective constituents of the illustrated apparatus are functionally conceptual, and the physically same configuration is not always needed. In other words, the specific mode of dispersion and integration of the device is not limited to the illustrated one, and all or a part thereof may be functionally or physically dispersed or integrated in an optional unit, according to the various kinds of load and the status of use. For example, leaning and prediction can be implemented by separate devices such as a learning device including the learning unit 21 and the creation unit 22, and a discrimination device including the prediction unit 23.
All or an optional part of the various processing functions performed by the respective device can be realized by a CPU or a program analyzed and executed by the CPU, or can be realized as hardware by the wired logic.
Hardware
The communication device 10a is a network interface card or the like, and performs communication with other servers. The HDD 10b stores therein a program that causes the functions illustrated in
The processor 10d reads a program for executing the same processing as that of the respective processing units illustrated in
In this manner, the learning device 10 operates as an information processing device that performs a prediction method by reading and executing the program. Further, the learning device 10 can realize the same functions as those of the embodiments described above by reading the program from a recording medium by a media reader and executing the read program. Programs in other embodiments are not limited to being executed by the learning device 10. For example, the present invention can be applied to a case in which other computers or servers execute the program, or in which the computers or servers cooperate with each other to execute the program.
According to the embodiments, it is possible to reduce a processing time for acquiring stable prediction.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2018-174292 | Sep 2018 | JP | national |