The present disclosure relates to a creation of training cases for use in machine learning.
In a case where the number of training cases used for machine learning is not sufficient, artificially generated cases (hereinafter referred to as “artificial cases”) may be used as training cases. For example, Non-Patent Document 1 discloses a technique for generating the artificial cases similar to actual cases close to a decision boundary. Non-Patent Documents 2 and 3 disclose a generation method of the artificial cases.
However, in the above technique, generated artificial cases do not necessarily contribute to improving a predictive performance of a machine learning model.
It is one object of the present disclosure to provide an information processing device capable of generating the artificial cases which contribute to improving a prediction performance of the machine learning model.
According to an example aspect of the present disclosure, there is provided an information processing device including:
According to another example aspect of the present disclosure, there is provided an information processing method including:
According to still another example aspect of the present disclosure, there is provided a recording medium storing a program, the program causing a computer to perform a process including:
According to the present disclosure, it is possible to generate artificial cases which contribute to improving a prediction performance of a machine learning model.
In the following, example embodiments will be described with reference to the accompanying drawings.
A principle of a method according to an example embodiment will be described.
First, an example of a method for creating training cases to be used for machine learning will be described as a basic method. In the machine learning, accuracy of an acquired machine learning model may be improved by adding not only actual cases which have been observed but also artificial cases made to resemble the actual cases to the training cases. However, even if artificial cases are added at random, it is difficult to efficiently improve the accuracy of the machine learning model. Therefore, in the basic method, the actual cases in which prediction of the machine learning model is uncertain, that is, the actual cases in which the prediction is difficult are selected, and a plurality of artificial cases similar to that actual cases are generated and added to the training cases. By repeating this process, the training cases are increased and prediction accuracy of the machine learning model is improved.
The basic method first obtains the actual cases close to a decision space, and generates a predetermined number of the artificial cases (v artificial cases) similar to the acquired actual cases. In an example of
Next, the basic method then reconstructs SVM by adding v generated artificial cases to the training cases. After that, the basic method acquires actual cases in which the prediction is uncertain based on a reconstructed SVM, and generates the artificial cases similar to the acquired actual cases. The basic method outputs the generated artificial cases after this process is repeated for a certain number of times.
However, the artificial case obtained by the above basic method does not always improve the prediction accuracy of the machine learning model. This is because the basic method mainly has the following two problems.
A first problem is that the artificial case generated from an uncertain actual case is not necessarily similarly uncertain.
A second problem is that it becomes redundant in a case where a plurality of artificial cases generated from the same actual cases are used as the training cases. Since the v artificial cases generated from the same actual cases by the basic method are similar to each other, the larger the number v of artificial cases, the more similar artificial cases are added to the training cases, and the less contribution to improve the prediction performance. In addition, there is a possibility that a distribution of the training cases deviates from a distribution of original actual cases and adversely affects the prediction accuracy by adding only similar artificial cases. In this regard, the second problem can be suppressed by reducing the number v of artificial cases, but then the first problem described above becomes larger. In other words, in a case where the number v of artificial cases is large, it is more likely that good artificial cases will be added by chance, but if the number v is small, only artificial cases that do not contribute to improve the performance may be added.
In view of the above problems, a technique of the example embodiment performs the following processes.
According to this technique, the artificial cases in which the predictions are less uncertain are no longer added to the training cases, so that only artificial cases in which the predictions are actually uncertain are added to the training cases. Thus, the above problem 1 is solved. In addition, by excluding the artificial cases in which the predictions are less uncertain, the above problem 2 is solved, as only similar artificial cases are not added to the training cases. Note that since the artificial cases are generally conducted by synthesis of cases, a cost of generating the artificial cases is low. In contrast, a computational cost of machine learning due to the increased number of training cases is high. Therefore, as in the method of the example embodiment, it is more efficient to create a large amount of artificial cases once and add only good cases to the training cases, because a computational cost of the machine learning is reduced.
In contrast, since the technique in the example embodiments selects each artificial case in which the prediction is uncertain from the generated artificial cases, it is possible to add each case to the place in which the prediction of the machine learning model is uncertain without excessively generating cases in the similar place in the feature space as depicted in
Next, an artificial case generation device 100 according to a first example embodiment will be described. The artificial case generation device 100 generates artificial cases to be added to the training cases based on the actual cases.
The interface 11 inputs and outputs data to and from an external device. Specifically, the interface 11 acquires the actual cases from an outside.
The processor 12 is a computer such as a CPU (Central Processing Unit) and controls the entire artificial case generation device 100 by executing programs prepared in advance. The processor 12 may be a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array). The processor 12 executes the artificial case generation process to be described later.
The memory 13 consist of a ROM (Read Only Memory) and a RAM (Random Access Memory). The memory 13 is also used as a working memory during various process operations by the processor 12.
The recording medium 14 is a non-volatile and non-transitory recording medium such as a disk-shaped recording medium, a semiconductor memory, or the like, and is configured to be detachable with respect to the artificial case generation device 100. The recording medium 14 records various programs executed by the processor 12. In a case where the artificial case generation device 100 executes various kinds of processes, the program recorded in the recording medium 14 is loaded into the memory 13 and executed by the processor 12. The DB 15 stores the actual cases input through the interface 11 and the artificial cases generated based on the actual cases.
The input unit 21 acquires a plurality of actual cases, and outputs to the artificial case generation unit 22. The artificial case generation unit 22 selects the actual cases in some method from the plurality of actual cases being input. A method for selecting the actual cases will be described later. Then, the artificial case generation unit 22 generates a plurality of artificial cases using the actual cases selected, and outputs the generated artificial cases to the artificial case selection unit 23. Note that, the process performed by the artificial case generation unit 22 corresponds to the process 1 described above.
The artificial case selection unit 23 selects artificial cases in which the predictions are uncertain from the plurality of artificial cases which have been generated, and outputs to the output unit 24. The method for selecting the artificial cases in which the predictions are uncertain will be explained in detail later. Incidentally, a process executed by the artificial case selection unit 23 corresponds to the process 2 described above. Then, the output unit 24 adds the input artificial cases to the training cases to be used for training the machine learning model.
Next, the artificial case selection unit 23 will be described in detail. The artificial case selection unit 23 selects each artificial case to be added as the training case from the plurality of artificial cases generated by the artificial case generation unit 22.
First, a method for selecting artificial cases by the artificial case selection unit 23 will be described.
In a method 1, the artificial case selection unit 23 selects each “artificial case in which the prediction is uncertain” as described with reference to
In a method 2, instead of simply selecting each artificial case in which the prediction is uncertain, the artificial case selection unit 23 selects “a plurality of artificial cases in which the predictions are uncertain and are not similar to each other”. By this selection, since non-similar artificial cases are added without selecting similar redundant artificial cases and the artificial cases not similar to each other are added, it is possible to improve the efficiency of learning, and the problem 2 described above is further successfully solved. In detail, as the method 2, any of the following three methods are used.
In a method 2-1, the artificial case selection unit 23 calculates a degree of similarity between the artificial cases, and selects the artificial case so as not to be similar to each other.
Next, in step S14, the artificial case selection unit 23 selects the artificial cases with higher uncertainty from the plurality of artificial cases in which the predictions are uncertain so as not to be similar to each other. Specifically, the artificial case selection unit 23 calculates the degree of similarity between the artificial cases, and does not select an artificial case having a high degree of similarity to the artificial case which has been already selected. Thus, the artificial cases which are not similar to each other are selected. After that, in step S15, the output unit 24 adds the selected artificial case to the training cases.
In a method 2-2, the artificial case selection unit 23 selects each artificial case so that the actual cases closest to the artificial case to be acquired do not match each other.
Next, in step S24, the artificial case selection unit 23 selects the artificial cases in which the predictions are uncertain, from a plurality of artificial cases in which the prediction is uncertain so that actual cases with the closest distance do not match. Specifically, the artificial case selection unit 23 determines each actual case in which a distance in the feature space is closer to each of the artificial cases having the high uncertainty (hereinafter, referred to as “closer neighbor actual case”), and selects a plurality of artificial cases so that the closer neighbor actual cases are different from each other. For instance, the artificial case selection unit 23 selects the artificial case one by one from the plurality of artificial cases having the same actual case as the closer neighbor actual case. Thus, artificial cases that are not similar to each other are selected. After that, in step S25, the outputting unit 24 adds the selected artificial cases to the training cases.
In this case, as the distance of the artificial case and the actual case, the artificial case selection unit 23 may use a Euclidean distance, may use a distance other than the Euclidean distance, or may use a similarity such as a cosine similarity.
Moreover, instead of selecting the artificial case so that the closer neighbor actual cases do not match as described above, the artificial case selection unit 23 may select artificial cases so that a predetermined number of neighbor cases (M neighbor cases, where M≤ K) are not match each other, from among a predetermined number of neighbor cases (K neighbor cases) having closer distances.
In a method 2-3, the artificial case selection unit 23 selects each artificial case not to match the actual cases being generation sources. Specifically, in response to generation of a plurality of artificial cases from actual cases by the artificial case generation unit 22, the artificial case selection unit 23 pairs the actual cases to be the generation sources with each artificial case. Next, the artificial case selection unit 23 calculates the uncertainty for each artificial case, and acquires one or more artificial cases in an order of high uncertainty. At this time, the artificial case selection unit 23 does not acquire the artificial case which is paired with the actual case being the same as that for another artificial case already acquired, that is, does not acquire the artificial case where the same actual case as the artificial case already acquired is the generation source. As a result, a plurality of artificial cases with the same actual case being the generation source are no longer selected at the same time. Thus, the artificial case selection unit 23 acquires a certain number of artificial cases. After that, the output unit 24 adds each selected artificial case to the training cases.
(2) Method for Selecting Cases with Uncertain Predictions
Next, a method for selecting cases with uncertain predictions will be described. In the present example embodiment, an active learning is utilized as an index to select cases in which the predictions are uncertain. The active learning (active learning) is a technique to find cases which cannot be predicted well by a current machine learning model, and to have an Oracle assign labels. The accuracy of the machine learning model can be improved by relearning by adding cases in which the Oracle has assigned labels. Note that the Oracle may be a human or a machine learning model.
In the present example embodiment, the artificial case selection unit 23 selects an artificial case in which prediction is determined to be uncertain when evaluated by a criterion used in the active learning as an artificial case in which prediction is uncertain. In other words, the artificial case selection unit 23 selects each artificial case subject to query to the Oracle (hereinafter, also referred to as a “query case”) in a case of the evaluation by the technique of the active learning, as the artificial case in which the prediction is uncertain. Hereinafter, each specific technique of the active learning will be described in detail. Note that a technique of the active learning other than the following three techniques may be used.
A query by committee can be used as the technique of the active learning.
For instance, in a case of using a vote entropy which is one of Query by committee methods, a vote entropy value can be used to determine the query case. In the vote entropy, a case in which an entropy of voting results by a plurality of classifiers is maximum (that is, the case in which the vote is the most split) is regarded as the query case. In detail, a case x{circumflex over ( )} assigned by the following equation is the query case. Note that in the present specification, for convenience, a letter where “{circumflex over ( )}” is added above a letter “x” is described as a letter “x{circumflex over ( )}”.
The vote entropy value is indicated in parentheses in a formula (2). Therefore, in a case of using the vote entropy, the artificial case selection unit 23 may regard each artificial case which vote entropy value is a certain value or higher as the artificial case in which the prediction is uncertain.
As another method for the active learning, Uncertainty sampling can be used. Specifically, a Least confident in the Uncertainty sampling can be used as an indicator of uncertainty of the prediction. In this case, as depicted in the following equation, the case x{circumflex over ( )} where a probability of “a label with a maximum probability” is minimum is regarded as the query case.
Therefore, in a case of using the Least confident, the artificial case selection unit 23 may consider the case x{circumflex over ( )} in which a value V1 in parentheses in an equation (3) is less than a certain value, as the artificial case in which the prediction is uncertain.
Also, Margin sampling in the Uncertainty sampling can be used as the indicator of uncertainty of the prediction. In this case, as expressed in the following equation, the case x{circumflex over ( )} in which a difference between a probability of a “first probability label” and a probability of a “second most likely label” is minimum is regarded as the query case.
Therefore, in a case of using the Margin sampling, the artificial case selection unit 23 may consider the case x{circumflex over ( )} in which a value V2 in parentheses in an equation (4) is less than a certain value, as the artificial case in which the prediction is uncertain.
Next, the artificial case generation unit 22 will be described in detail.
First, a method for selecting the actual case which is a source of the artificial case will be described. The artificial case generation unit 22 may basically select the actual case in some way. Accordingly, for instance, the artificial case generation unit 22 may generate the artificial case using all actual cases, and may generate the artificial case using an actual case randomly selected from all actual cases.
However, since the artificial case selection unit 23 selects the artificial case in which the prediction is uncertain among the generated artificial cases as the artificial case to be added to the training cases, it is desirable that the actual case serving as the generation source of the artificial case is an actual case in which the artificial case in which the prediction is uncertain is likely to be generated. From this point of view, the active learning described in advance can be also used to select each actual case. That is, the artificial case generation unit 22 selects each actual case in which the prediction is uncertain using the method of the active learning, from a plurality of actual cases, and generates the plurality of artificial cases using the selected actual case.
Next, in step S33, the artificial case generation unit 22 generates each artificial case from the selected actual case. Each artificial cases generated is output to the artificial case selection unit 23. Next, in step S34, the artificial case selection unit 23 selects each artificial case in which the prediction is uncertain, from the input artificial case. In this case, an active learning method is used for two times when the artificial case generation unit 22 selects each actual case and when the artificial case selection unit 23 selects the artificial case in which the prediction is uncertain.
Next, a generation method of each artificial case by the artificial case generation unit 22 will be described. The artificial case generation unit 22 generates the artificial case by synthesizing the actual case serving as the generation source and another actual case. In one method, the artificial case generation unit 22 can generate each artificial case using the equation (1) described above. Moreover, the artificial case generation unit 22 can also use an artificial case generation technique such as a MUNGE depicted in Non-Patent Document 2 or a SMOTE depicted in Non-Patent Document 3.
Next, an artificial case generation process by the artificial case generation device 100 will be described.
First, the input unit 21 acquires the actual case (step S41). Next, the artificial case generation unit 22 generates each artificial case based on the acquired actual case (step S42). At this time, as the actual cases of the generation sources of the artificial cases as described above, the artificial case generation unit 22 may use all actual cases, may use the actual case randomly selected, or may use the actual cases in which the predictions are uncertain and which are selected by the technique of the active learning. In addition, the artificial case generation unit 22 may use the equation (1) as a generation method of the artificial case, and may use a technique such as a MUNGE or a SMOTE. The artificial case generation unit 22 outputs the generated artificial case to the artificial case selection unit 23.
Next, from the entered artificial case, the artificial case selection unit 23 selects each artificial case in which the prediction is uncertain (step S43). At this time, the artificial case selection unit 23 selects the artificial case by any of the methods of the method 1, the method 2-1, the method 2-2, and the method 2-3 as described above. The artificial case selection unit 23 outputs each selected artificial case to the output unit 24. Next, the output unit 24 outputs the input artificial case, that is, the artificial case selected by the artificial case selection unit 23 as the training cases (step S44).
Next, the artificial case generation device 100 determines whether or not an end condition is satisfied (step S45). For instance, when a necessary predetermined number of artificial cases are obtained, the artificial case generation device 100 determines that the end condition is satisfied. When the end condition is not satisfied (step S45: No), the process returns to step S41, and steps S41 to S45 are repeated. On the other hand, when the end condition is satisfied (step S45: Yes), the process is terminated.
In the example embodiment above described, the artificial case generation device 100 outputs each artificial case without a label, but instead, may output the artificial case with the label. For instance, the output unit 24 may assign the label with respect to each artificial case input from the artificial case selection unit 23, and may output a labeled artificial case. In this case, the output unit 24 may assign the same label as that for the actual case which has been the generation source, with respect to the input artificial case. Alternatively, the output unit 24 may assign a label, which has been prepared in advances and is assigned by the machine learning model, with respect to the input artificial case. Note that the label may be manually assigned to the artificial case and may be output as the labeled artificial case.
According to the information processing device 70 of the second example embodiment, it becomes possible to generate each artificial case which contributes to improve the prediction performance of the machine learning model.
A part or all of the example embodiments described above may also be described as the following supplementary notes, but not limited thereto.
An information processing device comprising:
an artificial case selection means configured to select each artificial case in which a prediction of a machine learning model is to be uncertain, from the plurality of artificial cases; and an output means configured to output each selected artificial case.
The information processing device according to supplementary note 1, wherein the artificial case selection means selects the plurality of artificial cases so that each selected artificial case is different.
The information processing device according to supplementary note 1 or 2, wherein the artificial case selection means selects the plurality of artificial cases so that actual cases existing in a vicinity are different in a feature space.
The information processing device according to supplementary note 1 or 2, wherein the artificial case selection means selects the plurality of artificial cases so that actual cases to be generation sources for respective artificial cases are different from each other.
The information processing device according to any one of supplementary notes 1 to 4, wherein the artificial case selection means generates the artificial cases using all input actual cases.
The information processing device according to any one of supplementary notes 1 to 4, wherein the artificial case generation means generates the artificial cases using a plurality of actual cases randomly selected from among the input actual cases.
The information processing device according to any one of supplementary notes 1 to 4, the artificial case generation means selects each actual case in which a prediction of a machine learning model is uncertain among a plurality of the input actual cases, and generates the plurality of artificial cases using each selected actual case.
The information processing device according to any one of supplementary notes 1 to 7, wherein the output means assigns a label to each selected artificial case and outputs each labeled actual case.
An information processing method comprising:
A recording medium storing a program, the program causing a computer to perform a process comprising:
While the disclosure has been described with reference to the example embodiments and examples, the disclosure is not limited to the above example embodiments and examples. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the claims.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/JP2021/039076 | 10/22/2021 | WO |