The present application claims priority to Korean Patent Application No. 10-2022-0082728, filed Jul. 5, 2022, the entire contents of which is incorporated herein for all purposes by this reference.
The present disclosure relates to a method of determining an early stopping point in time of classification neural network leaning using unlabeled data.
A neural network is supervised-trained by a previously labeled training dataset, and when the number of times of training exceeds a certain number of times, the neural network is overfitted to the training dataset, so a problem that performance on a test dataset decreases is generated. A user has to stop learning of the neural network at an appropriate point in time in consideration of this problem, which is called early stopping.
In detail, referring to
Referring to
However, when an early stopping point in time is determined in this way, it is required to allocate a portion of a labeled dataset as the validation dataset, so there is a problem that the amount of a training dataset that is used in actual training decreases. This problem is more fatal when a neural network is trained for tasks that have difficulty in securing a large amount of labeled dataset, for example, a task of classifying medical images.
An objective of the present disclosure is to determine an early stopping point in time of neural network learning using a great amount of unlabeled data in addition to a small amount of labeled data.
The objectives of the present disclosure are not limited to those described above and other objectives and advantages not stated herein may be understood through the following description and may be clear by embodiments of the present disclosure. Further, it would be easily known that the objectives and advantages of the present disclosure may be achieved by the configurations described in claims and combinations thereof.
In order to achieve the objectives described above, an early stopping method for a neural network according to an embodiment of the present disclosure includes: dividing a labeled dataset into a training dataset and a validation dataset; creating a pretrained neural network by training the pretrained neural network using the training dataset and early stopping learning of the pretrained neural network using the validation dataset; and creating a target neural network for each epoch by training the target neural network using the entire labeled dataset, and early stopping learning of the target neural network on the basis of a similarity between output of the pretrained neural network on at least one of the labeled data and unlabeled data and output of the target neural network on the unlabeled data.
In an embodiment, the early stopping includes early stopping learning of the target neural network at an epoch at which the similarity between the outputs of the pretrained neural network and the target neural network is the maximum.
In an embodiment, the early stopping includes early stopping learning of the target neural network on the basis of a similarity between a sample confidence of the pretrained neural network on the labeled dataset and a sample confidence of the target neural network on the unlabeled dataset.
In an embodiment, the early stopping includes: creating a first confidence graph by arranging sample confidences of the pretrained neural network in order of magnitude; creating a second confidence graph by arranging sample confidences of the target neural network in order of magnitude; and early stopping learning of the target neural network on the basis of a similarity between the first and second confidence graphs.
In an embodiment, the early stopping includes: sampling the second confidence graph such that the numbers of samples corresponding to the first and second confidence graphs become the same; and early stopping learning of the target neural network on the basis of a similarity between the first confidence graph and the sampled second confidence graph.
In an embodiment, the early stopping includes early stopping learning of the target neural network on the basis of a similarity between prediction class distributions of the pretrained neural network and the target neural network on unlabeled data.
In an embodiment, the early stopping includes: calibrating the prediction class distribution of the pretrained neural network on the unlabeled data on the basis of the prediction class distribution of the pretrained neural network on the validation dataset or an actual class distribution of the labeled dataset and accuracy of the pretrained neural network on the validation dataset; and early stopping learning of the target neural network on the basis of the similarity between the calibrated prediction class distribution of the pretrained neural network and the prediction class distribution of the target neural network.
In an embodiment, the calibrating includes calibrating the prediction class distribution of the pretrained neural network on the unlabeled data in accordance with the following [Equation 1],
(where Cu′ is a calibrated prediction class distribution, B is the prediction class distribution of the pretrained neural network on the validation dataset or the actual class distribution of the labeled dataset, Accval is the accuracy of the pretrained neural network on the validation dataset, nc is the number of classes, and Cu is the prediction class distribution of the pretrained neural network on the unlabeled data).
In an embodiment, the early stopping includes early stopping learning of the target neural network on the basis of a first similarity between a sample confidence of the pretrained neural network on the labeled dataset and a sample confidence of the target neural network on unlabeled data and a second similarity between prediction class distributions of the pretrained neural network and the target neural network on the unlabeled data.
In an embodiment, the early stopping includes further training the target neural network by preset epochs including an epoch at which the first similarity is the maximum, and early stopping learning of the target neural network at an epoch at which the second similarity is the maximum of the preset epochs.
According to the present disclosure, it is possible to train a neural network using the entire labeled dataset without allocating a portion of the labeled dataset as a validation dataset, so it is possible to improve the performance of the neural network.
Further, according to the present disclosure, an ideal early stopping point in time of learning of a neural network is determined using a great amount of unlabeled data, so it is very useful for learning of a neural network particularly for tasks with a small amount of labeled dataset.
Detailed effects of the present disclosure in addition to the above effects will be described with the following detailed description for accomplishing the present disclosure.
The accompanying drawings of this specification exemplify preferred embodiments and help easy understanding of the present invention together with the following detailed description, so the present invention should not be construed as being limited to the drawings.
The objects, characteristics, and advantages will be described in detail below with reference to the accompanying drawings, so those skilled in the art may easily achieve the spirit of the present disclosure. However, in describing the present disclosure, detailed descriptions of well-known technologies will be omitted so as not to obscure the description of the present disclosure with unnecessary details. Hereinafter, exemplary embodiments of the present invention will be described with reference to accompanying drawings. The same reference numerals are used to indicate the same or similar components in the drawings.
Although terms ‘first’, ‘second’, etc. are used to describe various components in the specification, it should be noted that these components are not limited by the terms. These terms are used to discriminate one component from another component and it is apparent that a first component may be a second component unless specifically stated otherwise.
Further, when a certain configuration is disposed “over (or under)” or “on (beneath)” of a component in the specification, it may mean not only that the certain configuration is disposed on the top (or bottom) of the component, but that another configuration may be interposed between the component and the certain configuration disposed on (or beneath) the component.
Further, when a certain component is “connected”, “coupled”, or “jointed” to another component in the specification, it should be understood that the components may be directly connected or jointed to each other, but another component may be “interposed” between the components or the components may be “connected”, “coupled”, or “jointed” through another component.
Further, singular forms that are used in this specification are intended to include plural forms unless the context clearly indicates otherwise. In the specification, terms “configured”, “include”, or the like should not be construed as necessarily including several components or several steps described herein, in which some of the components or steps may not be included or additional components or steps may be further included.
Further, the term “A and/or B” stated in the specification means that A, B, or A and B unless specifically stated otherwise, and the term “C to D” means that C or more and D or less unless specifically stated otherwise.
The present disclosure relates to a method of determining an early stopping point in time of classification neural network leaning using unlabeled data. Hereafter, a method of early stopping neural network learning according to an embodiment of the present disclosure is described in detail with reference to
Referring to
However, early stopping method for neural network shown in
The steps shown in
Hereafter, the steps shown in
A processor can divide a labeled dataset into a training dataset and a validation dataset (S10). In this case, the labeled dataset, which is data labeled in advance by a user for supervised learning of a neural network, may be composed of input data and a corresponding classes when a neural network performs a classification task.
Referring to
The processor can divide the labeled dataset, as shown in
Hereafter, for the convenience of description, the labeled dataset can be denoted as (x, y), the training dataset as (xt, yt), and the unlabeled dataset as (xu, yu).
The processor can train the neural network using the training dataset and can early stop learning of the neural network using the validation dataset, thereby being able to create pretrained neural network (S20).
When an initialized neural network is prepared, the processor can supervised-train the neural network by setting input/output of the neural network on the basis of training dataset. In detail, the processor can train a neural network by setting data that are input to the neural network as xt and data that are output from the neural network as yt.
As such supervised training is repeatedly performed, the neural network can learn the correlation of input/output data (xt, yt), and when a certain xt is input to the neural network, parameters (weight and bias) of the neural network can be updated such that a corresponding class, that is, yt is output.
However, as described above with reference to
In detail, the processor calculates an error of the neural network using the validation dataset while repeating leaning using the training data and early stops learning of the neural network at a point at which an error on the validation dataset is minimum, thereby being able to create a pretrained neural network. Accordingly, the parameters of the pretrained neural network may be determined as parameters updated until an early stopping point in time.
However, when an early stopping point in time is determined in accordance with the method described above, there is a problem that when the amount of a labeled dataset is small, the accuracy of the neural network rapidly drops.
Referring to
Further, when an early stopping point in time is determined in accordance with the method described above, there is a problem that the difference between the performance of a neural network on a validation dataset and the performance of a neural network on a test dataset when the amount of a labeled dataset is small is large.
Referring to
The present disclosure can use not only a pre-prepared labeled dataset, but an unlabeled dataset to determine an ideal early stopping point in time particularly when the amount of labeled dataset is small, and hereafter, an early stopping method for a neural network according to the present disclosure is described in detail.
The processor can create a target neural network for each epoch, that is, each number of times of learning by training the target neural network using the entire labeled dataset (S30). In this case, learning of the target neural network may start from a newly initialized neural network rather than the pretrained neural network created in accordance with step S20 described above.
Meanwhile, the target neural network may be a neural network having parameters that are updated by the entire labeled dataset. Accordingly, the meaning that a target neural network is created for each epoch should be understood as a meaning that parameters of a target neural network are determined for each epoch. Meanwhile, the number of unlabeled data in the present disclosure should be understood as being greatly larger than the labeled dataset.
In detail, the processor can create a target neural network using the entire labeled dataset as a training dataset without dividing the labeled dataset into a training dataset and a validation dataset. Since the supervised learning method of a neural network was described above, it is not described in detail.
When a target neural network is created for each epoch, the processor can calculate similarities between outputs of the pretrained neural network and the target neural network on input samples (S40), and can early stop learning of the target neural network on the basis of the similarity (S50). In this case, the input samples, which are data that are input to the pretrained neural network and the target neural network, may include unlabeled data and labeled data.
In an embodiment, the processor can stop learning of the target neural network on the basis of similarity between the sample confidence of the pretrained neural network on the labeled dataset and the sample confidence of the target neural network on the unlabeled data. In this case, the sample confidence may be a maximum value of the class probabilities that are output from a neural network when a sample is input to the neural network.
In detail, the processor can input x included in the labeled dataset to the pretrained neural network previously created and the pretrained neural network can output probabilities that x belongs to each class. In this case, the processor can determine the maximum value of the class probabilities as the sample confidence.
For example, when a pretrained neural network is trained to perform an animal classification task, the processor can input an image x included in a labeled dataset into the pretrained neural network and the pretrained neural network can output a probability that x belongs to each class as in the following [Table 1]
In this case, the processor can recognize each sample confidence as the maximum value of the class probabilities. In detail, the processor can recognize 0.91 as the sample confidence of x1, 0.61 as the sample confidence of x2, and as the sample confidence of x3.
Meanwhile, the processor can input unlabeled data into a target neural network at each epoch while creating the target neural network using the entire labeled dataset for training, and the target neural network can output the probabilities that the unlabeled data belong to the classes, respectively. In this case, the processor, similarly, can determine the maximum value of the class probabilities as the sample confidence.
Referring to
The processor can early stop learning of the target neural network at the epoch at which the similarity between the sample confidence P1 of the pretrained neural network and the sample confidence Pu of the target neural network is the maximum. Referring to
Meanwhile, because the sample confidences P1 and Pu cannot show tendency representing a data set, the processor can convert the sample confidences P1 and Pu into graph data having tendency to facilitate determining a similarity.
Referring to
The processor can recognize an epoch at which the similarity between the first and second confidence graphs G1 and G2 is the maximum by applying various methods that can determine similarity between graphs, and can early stop learning of the target neural network at the epoch.
Meanwhile, in order to match not only the tendency, but the numbers of data in the sample confidences P1 and Pu, it is possible to sample the sample confidences Pu on the unlabeled data by the number of the labeled data.
Referring to
Next, the processor can recognize an epoch at which the similarity (Sconf=sim(P1, pus)) between the first confidence graph G1 and the sampling graph Gs is the maximum, and can early stop learning of the target neural network at the epoch.
Referring to
In another embodiment, the processor can early stop learning of a target neural network on the basis of the similarity between prediction class distributions of a pretrained neural network and the target neural network on unlabeled data. In this case, the prediction class distribution is class distribution predicted using the unlabeled data and may be determined as the average probabilities for each class over the unlabeled data.
In detail, the processor can input unlabeled data to the previously created pretrained neural network and the target neural network created for each epoch, and the pretrained neural network and the target neural network can output probabilities that unlabeled data belong to each class. In this case, the processor can determine the average probabilities for each class over the unlabeled data as a prediction class distribution.
For example, when a pretrained neural network and a target neural network are trained to perform an emotion classification task, the processor can input a text xu that is an unlabeled datum to the neural network, and the pretrained neural network and the target neural network can output the probabilities that xu belongs to each class as in the following [Table 2] and [Table 3], respectively.
In this case, the processor can determine the prediction class distribution of the pretrained neural network as (0.6, that is ((0.94+0.43+0.76+0.27)/4, (0.06+0.57+0.24+0.73)/4) and the prediction class distribution of the target neural network as (0.58, 0.42) that is ((0.93+0.23+0.90+0.26)/4, (0.07+0.77+0.10+0.74)/4).
The processor can early stop learning of the target neural network at the epoch at which the similarity between the prediction class distribution of the pretrained neural network and the prediction class distribution of the target neural network is the maximum. In an example, the processor can calculate a cosine similarity between the prediction class distributions of the pretrained neural network and the target neural network at every epoch of the target neural network, and can early stop learning of the target neural network at the epoch at which the cosine similarity is the maximum.
Meanwhile, the pretrained neural network is a neural network trained on the basis of a small amount of training dataset, so the accuracy of the neural network may be low in comparison to an ideal case and the prediction class distribution may also be inaccurate because it depends on the performance of the pretrained neural network. The processor can calibrate the prediction class distribution of the pretrained neural network to improve inaccuracy due to low performance of a neural network.
In detail, the processor can calibrate a prediction class distribution to be proportioned to the difference between the performance of the pretrained neural network and the ideal performance, and to this end, it may use linear proportion.
In detail, the processor can calibrate the prediction class distribution of the pretrained neural network on an unlabeled dataset on the basis of the prediction class distribution of the pretrained neural network on a validation dataset or an actual class distribution of a labeled dataset and the accuracy of the pretrained neural network on the validation dataset.
Referring to
In this case, when the performance of a pretrained neural network on a validation dataset is Accval, the prediction class distribution on the validation dataset or the actual class distribution of a labeled dataset is B, and the prediction class distribution on an unlabeled dataset is Cu, the processor can estimate a prediction class distribution when the performance of the pretrained neural network is assumed to be ideal, Cu′, using linear proportion.
Referring to the example shown in
The processor can early stop learning of the target neural network at the epoch at which the similarity between the calibrated prediction class distribution Cu′ of the pretrained neural network and the prediction class distribution of the target neural network that is output at each epoch is the maximum.
Meanwhile, the processor may perform an early stopping operation by applying both the early stopping method based on the sample confidence described in previous embodiments and the early stopping method based on a prediction class distribution.
In detail, the processor can early stop learning of a target neural network on the basis of a first similarity between the sample confidence of a pretrained neural network on a labeled dataset and the sample confidence of the target neural network on unlabeled data and a second similarity between prediction class distributions of the pretrained neural network and the target neural network on the unlabeled data.
There is no correlation that can be quantified and there is independent tendency between the first similarity and the second similarity, so the processor can early stop learning of the target neural network at an appropriate point in time while referring to both the first and second similarities.
Referring to
In consideration of this, first, the processor specifies the epoch range in which it is estimated that a low loss would be represented on the basis of the first similarity, and determines an epoch at which it is estimated that the highest accuracy would be represented within the range on the basis of the second similarity, thereby being able to early stop learning of a target neural network.
In an example, the processor further trains the target neural network by preset epochs including the epoch having the maximum first similarity, and can early stop learning of the target neural network at the epoch having the maximum second similarity of preset epochs.
Referring to
The second similarity calculated at each epoch has the maximum value at the eighth epoch, so the processor can determine the eighth epoch as an early stopping point in time and can early stop learning of the target neural network at the eighth epoch. That is, the processor can determine the parameters determined at the eighth epoch as the final parameters of the target neural network.
According to the present disclosure described above, it is possible to train a neural network using the entire labeled dataset without allocating a portion of the labeled dataset as a validation dataset, so it is possible to improve the performance of the neural network.
Further, according to the present disclosure, an ideal early stopping point in time of learning of a neural network is determined using a great amount of unlabeled data, so it is very useful for learning of a neural network particularly for tasks with a small amount of labeled dataset.
Although the present disclosure was described with reference to the exemplary drawings, it is apparent that the present disclosure is not limited to the embodiments and drawings in the specification and may be modified in various ways by those skilled in the art within the range of the spirit of the present disclosure. Further, even though the operation effects according to the configuration of the present disclosure were not clearly described with the above description of embodiments of the present disclosure, it is apparent that effects that can be prediction from the configuration should be also admitted.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0082728 | Jul 2022 | KR | national |