The present invention relates to a learning apparatus, an estimation apparatus, a learning method, an estimation method, and a program.
Deep learning models are known to be able to execute tasks with high accuracy. For example, it has been reported that accuracy exceeding that of humans has been achieved in the task of image recognition.
On the other hand, it is known that a deep learning model behaves without intention for unknown data and data learned by applying an erroneous label (label noise). For example, in an image recognition model learning an image recognition task, there is a possibility that a correct class label will not be able to be estimated for an unknown image. In addition, there is a possibility of an image recognition model in which a pig image is mistakenly labeled as “rabbit” and trained estimating that the class label of the pig image is “rabbit.” In practical use, a deep learning model which performs such behavior is not preferable.
Odena, Augustus, Christopher Olah, and Jonathon Shlens. “Conditional image synthesis with auxiliary classifier gans.” International conference on machine learning. 2017.
Therefore, it is necessary to take measures in accordance with the cause of the estimation error. For example, if unknown data is the cause, the unknown data needs to be added to the training set. If the label noise is the cause, the label needs to be corrected.
However, it is difficult for a human to accurately estimate the cause of an error.
The present invention has been made in view of the above points, and an object of the present invention is to be able to automatically estimate the cause of an error by a deep model.
In order to solve the above problem, a learning apparatus includes: a data generation unit that learns generation of data based on a class label signal and a noise signal; an unknown degree estimation unit that learns estimation of a degree to which input data is unknown using a training set and the data generated by the data generation unit; a first class likelihood estimation unit that learns estimation of a first likelihood of each class label for input data using the training set; a second class likelihood estimation unit that learns estimation of a second likelihood of each class label for input data using the training set and the data generated by the data generation unit; a class likelihood correction unit that generates a third likelihood by correcting the first likelihood on the basis of the unknown degree and the second likelihood; and a class label estimation unit that estimates a class label of data related to the third likelihood on the basis of the third likelihood, and the data generation unit learns the generation on the basis of the unknown degree and the class label estimated by the class label estimation unit.
It is possible to automatically estimate the cause of an error by a deep model.
In the present embodiment, a model (deep neural network (DNN)) based on an auxiliary classifier generative adversarial network (ACGAN) is disclosed. Therefore, first, the ACGAN will be briefly described.
That is, in
Embodiments of the present invention will be described below with reference to the drawings.
A program that realizes processing in the class label estimation apparatus 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 in which the program is stored is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 through the drive device 100. The program may not necessarily be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and stores necessary files, data, and the like.
The memory device 103 reads and stores the program from the auxiliary storage device 102 when the program receives an instruction to start. The processor 104 is a CPU or a graphics processing unit (GPU), or a CPU and a GPU, and executes a function related to the class label estimation apparatus 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network.
The data generation unit 11 is a generator in ACGAN. That is, the data generation unit 11 uses a noise signal and a class label signal as inputs and generates data (for example, image data, etc.) corresponding to the label indicated by the class label signal, which is data similar to actual data (data that actually exists) using the noise signal and the class label signal. At the time of learning, the data generation unit 11 performs learning so that the unknown degree estimation unit 12 estimates the generated data as actual data. The data generation unit 11 is not used at the time of inference (at the time of estimating the class label of the actual data at the time of operation).
The unknown degree estimation unit 12 is a discriminator in ACGAN. That is, the unknown degree estimation unit 12 uses the generated data generated by the data generation unit 11 or the actual data included in the training set as inputs, and outputs an unknown degree related to the input data (a continuous value indicating a degree to which the data is generated data). The unknown degree estimation unit 12 performs threshold processing on the unknown degree. By using the data generated by the data generation unit 11 for learning of the unknown degree estimation unit 12, the unknown degree estimation unit 12 can be trained so that unknown data outside the training set can be explicitly discriminated as unknown.
The class likelihood estimation unit 13 and the class label estimation unit 14 constitute an auxiliary classifier in ACGAN.
The class likelihood estimation unit 13 uses the same input data as the input data to the unknown degree estimation unit 12 as an input, and estimates (calculates) the likelihood of each label for the input data. The likelihood is calculated in a softmax layer in the deep learning model. Therefore, the likelihood of each label is expressed by the softmax vector. The class likelihood estimation unit 13 is trained using both the generated data and the actual data.
The class label estimation unit 14 estimates the label of the input data on the basis of the likelihood of each label estimated by the class likelihood estimation unit 13.
The label noise degree estimation unit 15 and the cause estimation unit 16 are mechanisms added to the ACGAN in the first embodiment in order to estimate the cause of an error in estimation by the ACGAN.
The label noise degree estimation unit 15 estimates a label noise degree which is a degree of influence of label noise (label error in the training set) on the basis of the likelihood of each label estimated by the class likelihood estimation unit 13.
The softmax vector becomes a sharp vector such as [1.00, 0.00, 0.00] in which the likelihood of any one class is overwhelmingly close to 1 when there is no influence of label noise. On the other hand, when there is an influence of label noise, it becomes a flat vector such as [0.33, 0.33, 0.33] in which the likelihoods of all classes have similar values. Therefore, it can be said that the flatness of the softmax vector represents a label noise degree. Therefore, the label noise degree estimation unit 15 outputs, for example, the maximum value of the softmax vector, the difference between the upper two values, the entropy, and the like as the label noise degree.
The cause estimation unit 16 uses the unknown degree estimated by the unknown degree estimation unit 12 and the label noise degree estimated by the label noise degree estimation unit 15 to estimate whether there is a possibility of erroneous recognition because the data to be estimated on the label is unknown, there is a possibility of erroneous recognition due to label noise, or erroneous recognition is not performed because of no problem (that is, the cause of the error). For example, the cause estimation unit 16 determines the output by performing threshold processing for each of the unknown degree and the label noise degree.
A specific example of the threshold processing will be described. On the assumption that it is expected that the unknown degree becomes an index which becomes larger only for the unknown data and the label noise degree becomes an index which becomes larger only for the label noise data, a threshold α for the unknown degree and a threshold β for the label noise degree are set respectively. The cause estimation unit 16 estimates the unknown data as a cause when the unknown degree is higher than the threshold α, and estimates the label noise as a cause when the label noise degree is higher than the threshold β. In addition, when the unknown degree is equal to or less than the threshold α and the label noise degree is equal to or less than the threshold β, the cause estimation unit 16 estimates that there is no problem (about estimation of the label).
As described above, the configuration shown in
However, with respect to the above configuration, the inventor of the present application has confirmed that the performance of detecting label noise is low and that unknown data is also determined as label noise.
In addition, “max_prob,” “diff_prob,” and “entropy” on the horizontal axis correspond to the case where the maximum value of the softmax vector is the label noise degree, the case where the difference between the upper two values is the label noise degree, and the case where the entropy is the label noise degree in order. Each plot on
According to
A cause of this is considered by the inventor of the present application to be that a flat softmax vector based on unknown data (that is, data generated by the data generation unit 11) is included as an input of the label noise degree estimation unit 15. That is, although label noise is originally a concept defined for known data, in the first embodiment, an evaluation value obtained by integrating known and unknown data is used. Specifically, originally, the softmax vector desired to be acquired as the likelihood of each label is p(y|x, D={training set}), but the softmax vector actually obtained is p(y|x, D={training set, generated data}).
Therefore, next, a second embodiment improved on the basis of the above consideration will be described. Points of difference as to the first embodiment will be described in the second embodiment. Points which are not mentioned particularly in the second embodiment may be similar to those of the first embodiment.
In
More specifically, in the second embodiment, the class likelihood estimation unit 13 is trained using only the actual data included in the training set.
The sharp likelihood estimation unit 17 estimates (calculates) the likelihood of each label for the input data. The likelihood of each label is calculated in the softmax layer of the deep learning model. The class likelihood estimation unit 13 is trained using both the generated data and the actual data. Regarding the above points, the sharp likelihood estimation unit 17 is the same as the class likelihood estimation unit 13 in the first embodiment. Here, the sharp likelihood estimation unit 17 estimates (outputs) a sharp softmax vector. In order to enable such estimation, the sharp likelihood estimation unit 17 may perform learning so that the softmax vector of the estimation result becomes sharp. As an example of such a learning method, there is a method in which the term of entropy of the softmax vector is used as the constraint term of the loss function. Since the sharp vector and the small entropy have the same meaning, it is expected to estimate the sharp vector by performing learning so that the entropy becomes small.
Alternatively, after performing learning similar to that of the class likelihood estimation unit 13 in the first embodiment, the sharp likelihood estimation unit 17 may perform a conversion so as to sharpen a flat softmax vector among the softmax vectors which are estimation results based on the learning (hereinafter referred to as “initial estimation results”). For example, the conversion so as to sharpen a flat softmax vector may be performed by the following procedures (1) to (3).
In addition, various methods can be considered for conversion, such as binarizing each dimension of the softmax vector with the maximum value −ε (ε is a small value such as 10−9) of the softmax vector of the estimation result as a threshold.
The class likelihood correction unit 18 corrects the likelihood estimated by the class likelihood estimation unit 13 on the basis of the unknown degree estimated by the unknown degree estimation unit 12 and the likelihood estimated by the sharp likelihood estimation unit 17. As a correction method, for example, a method of adding weights by unknown degree as in (1) of the following [Math. 1] (that is, a method of using the weighted sum as a correction value) and a method of selecting the likelihood estimated by the class likelihood estimation unit 13 and the likelihood estimated by the sharp likelihood estimation unit 17 according to the condition for the unknown degree as in (2) of the following [Math. 1] can be mentioned. The class likelihood correction unit 18 may correct the likelihood estimated by the class likelihood estimation unit 13 by using a method (algorithm) different between the output to the label noise degree estimation unit 15 and the output to the class label estimation unit 14.
Here, rf is an unknown degree. softmax is an output (softmax vector) from the class likelihood estimation unit 13. softmaxsharp is an output (softmax vector) from the sharp likelihood estimation unit 17. th is a threshold.
In [Math. 1], (2-1) indicates that “the output of the sharp likelihood estimation unit 17 is selectively used for the data estimated not to be actual data (the output is used as the corrected likelihood).” (2-2) indicates that “the output of the class likelihood estimation unit 13 is selectively used with respect to the estimated actual data (the output is used as the corrected likelihood).”
By adding the sharp likelihood estimation unit 17 and the class likelihood correction unit 18, the estimation accuracy by the cause estimation unit 16 is expected to be improved. That is, a case where the unknown degree is higher than the threshold α and the label noise degree is higher than the threshold β is considered logically, but it is expected that such a case will be eliminated by the sharp likelihood estimation unit 17 and the class likelihood correction unit 18.
In the second embodiment, the class label estimation unit 14 and the label noise degree estimation unit 15 are different from the first embodiment in that the output from the class likelihood correction unit 18 is input instead of the output from the class likelihood estimation unit 13.
The data generation unit 11 performs learning so that the unknown degree is estimated to be low by the unknown degree estimation unit 12 and the same label as the class label signal is estimated by the class label estimation unit 14, similarly to the conventional ACGAN.
The unknown degree estimation unit 12 performs learning so that it can discriminate whether the input data is the output of the data generation unit 11 or the actual data, similarly to the conventional ACGAN.
The sharp likelihood estimation unit 17 uses the generated data and the actual data in the training set as inputs and performs learning so that the likelihood of the label of the input data becomes relatively high. For example, the sharp likelihood estimation unit 17 performs learning so that the likelihood is overwhelmingly high, such as the likelihood of the correct answer class=99%. The label of the input data is a label indicated by the class label signal when the input data is generated data, and is a label given to the actual data in the training set when the input data is the actual data in the training set.
The class likelihood estimation unit 13 performs learning so that the likelihood of a label given to actual data being input data becomes relatively high. At the time of learning, no generated data is input to the class likelihood estimation unit 13.
The class likelihood correction unit 18 corrects the likelihood of each label estimated by the class likelihood estimation unit 13 on the basis of the unknown degree estimated by the unknown degree estimation unit 12 and the likelihood of each label estimated by the sharp likelihood estimation unit 17.
The class label estimation unit 14 estimates the label of the input data on the basis of the likelihood of each label corrected by the class likelihood correction unit 18. The estimation result is used for learning of the data generation unit 11.
The processing of each unit at the time of inference is as described above. That is, the unknown degree estimation unit 12 estimates the unknown degree of the actual data. Each of the sharp likelihood estimation unit 17 and the class likelihood estimation unit 13 estimates the likelihood of each label for the actual data. The class likelihood correction unit 18 corrects the softmax vector which is an estimation result from the class likelihood estimation unit 13 on the basis of the unknown degree estimated by the unknown degree estimation unit 12 and the estimation result from the sharp likelihood estimation unit 17. The class label estimation unit 14 estimates the label of the actual data on the basis of the corrected likelihood of each label. The label noise degree estimation unit 15 estimates the label noise degree on the basis of the corrected likelihood of each label. The cause estimation unit 16 estimates the cause of the error (unknown, label noise, or no problem) by threshold processing for the unknown degree and the label noise degree.
Note that the type of label noise is different between
In both
In the second embodiment, since the unknown degree and the label noise degree are evaluated independently of each other, there is no guarantee that the label noise degree is lowered in the unknown data, but according to
The performance of detecting the unknown data is similar to that of the “rf” column and the “ex rf” column. This indicates that there is almost no adverse effect due to the change of the likelihood estimation method for each label with respect to the detection of unknown data at the unknown degree.
As described above, according to the second embodiment, it is possible to automatically estimate the cause of an error by the deep model while executing the task (label estimation). In addition, it is possible to secure the validity of the model as an evaluation value of label noise. Further, it is possible to prevent the flatness of the softmax, which is an evaluation value of label noise, from reacting with unknown data (avoid making the softmax vector flat with respect to unknown data), and improve the performance of estimating errors due to label noise.
In the second embodiment, the class label estimation apparatus 10a is an example of a learning apparatus and the class label estimation apparatus 10. The class likelihood estimation unit 13 is an example of a first class likelihood estimation unit. The sharp likelihood estimation unit 17 is an example of a second class likelihood estimation unit.
Although the embodiments of the present invention have been described in detail above, the present invention is not limited to these particular embodiments, and various modifications and changes are possible within the scope of the gist of the present invention described in the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/039602 | 10/21/2020 | WO |