This application claims the benefit of Korean Patent Application No. 10-2023-0053239, filed Apr. 24, 2023, which is hereby incorporated by reference in its entirety into this application.
The present disclosure relates generally to technology for contrastive learning of a neural network for automatic data labeling, and more particularly to contrastive learning technology for discovering features of large-scale data, grouping the large-scale data by similar features, and training a neural network based thereon.
In Artificial Intelligence (AI), data processing is the most time-consuming and costly part and requires human intervention. However, as AI neural networks became deeper and the amount of data required to be processed increased, self-supervised learning technology through which features of large-scale data can be discovered and represented without human intervention has been developed. Contrastive learning, which is one of methods for data representations in self-supervised learning, is a data classification technique for representing diverse properties of data through a mechanism in which data transformed from anchor data is brought closer but data transformed from data other than that is pushed apart by considering the same as a different representation.
In the existing contrastive learning, the output value of a neural network for data transformed from anchor data is represented as belonging to the same category as the anchor data, because there is no information about a classification system for the anchor data. Also, loss is calculated by representing the other data samples as belonging to a different category, and the weight values of the neural network may be updated based thereon. Here, when two pieces of transformed data generated for a single piece of input data are respectively input to neural networks of the same structure, the results output therefrom are predicted to be the same representations. Conversely, the other pieces of input data in the same set are predicted to be different representations. Through such prediction, data classification is performed such that the same data is brought closer and different data is pushed away, and an algorithm based on this technique is referred to as contrastive learning and is applied to automated data labeling.
However, a single dataset corresponding to a batch size, which is set for updating a neural network, may contain different pieces of data that have to be regarded as of the same type, but the existing method cannot consider this case.
An object of the present disclosure is to calculate a loss value by reflecting not only a data sample generated from an anchor sample but also a similar data sample belonging to the same kind of class as the anchor sample, thereby improving accuracy when a neural network is updated.
Another object of the present disclosure is to improve the prediction accuracy of a neural network through more accurate revision thereof.
A further object of the present disclosure is to calculate a loss value by considering similar data belonging to the same kind of class as an anchor sample as positive data, thereby performing more accurate update of a neural network.
In order to accomplish the above objects, a method for contrastive learning of a neural network for automatic data labeling according to the present disclosure includes generating transformed data samples for unlabeled input data samples corresponding to a batch size, detecting a positive sample (positive data) in the transformed data samples in consideration of the similarity between a single data sample for contrastive learning, among the input data samples, and each of the transformed data samples, and performing contrastive learning of a neural network based on a loss value of the positive sample.
Here, a transformed data sample, the cosine similarity of which with the single data sample is equal to or greater than a preset reference value, may be detected as the positive sample.
Here, the positive sample may include a transformed data sample belonging to the same kind of class as the single data sample depending on the cosine similarity.
Here, a transformed data sample belonging to the same kind of class as the single data sample and having the cosine similarity equal to or greater than the preset reference value may be classified as the positive sample.
Here, when a data sample is a transformed data sample for the single data sample but has the cosine similarity less than the preset reference value, the data sample may be classified as a negative sample (negative data).
Here, the lower the preset reference value for the cosine similarity, the more the positive samples that are detected, and the higher the preset reference value for the cosine similarity, the less the positive samples that are detected.
Here, a total loss value may be calculated based on the loss value of the positive sample, and the neural network may be updated by setting weights of the neural network so as to minimize the total loss value.
Also, an apparatus for contrastive learning of a neural network for automatic data labeling according to an embodiment of the present disclosure includes a processor for generating transformed data samples for unlabeled input data samples corresponding to a batch size, detecting a positive sample (positive data) in the transformed data samples in consideration of the similarity between a single data sample for contrastive learning, among the input data samples, and each of the transformed data samples, and performing contrastive learning of a neural network based on a loss value of the positive sample; and memory for storing the neural network.
Here, the processor may detect a transformed data sample, the cosine similarity of which with the single data sample is equal to or greater than a preset reference value, as the positive sample.
Here, the positive sample may include a transformed data sample belonging to the same kind of class as the single data sample depending on the cosine similarity.
Here, the processor may classify a transformed data sample belonging to the same kind of class as the single data sample and having the cosine similarity equal to or greater than the preset reference value as the positive sample.
Here, when a data sample is a transformed data sample for the single data sample but has the cosine similarity less than the preset reference value, the processor may classify the data sample as a negative sample (negative data).
Here, the lower the preset reference value for the cosine similarity, the more the positive samples that are detected, and the higher the preset reference value for the cosine similarity, the less the positive samples that are detected.
Here, the processor may calculate a total loss value based on the loss value of the positive sample and update the neural network by setting weights of the neural network so as to minimize the total loss value.
The above and other objects, features, and advantages of the present disclosure will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
The present disclosure will be described in detail below with reference to the accompanying drawings. Repeated descriptions and descriptions of known functions and configurations which have been deemed to unnecessarily obscure the gist of the present disclosure will be omitted below. The embodiments of the present disclosure are intended to fully describe the present disclosure to a person having ordinary knowledge in the art to which the present disclosure pertains. Accordingly, the shapes, sizes, etc. of components in the drawings may be exaggerated in order to make the description clearer.
In the present specification, each of expressions such as “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B, or C”, “at least one of A, B, and C”, and “at least one of A, B, or C” may include any one of the items listed in the expression or all possible combinations thereof.
Hereinafter, a preferred embodiment of the present disclosure will be described in detail with reference to the accompanying drawings.
Referring to
Generally in contrastive learning, a neural network to be trained is completed by updating weights of the neural network based on the loss value between data similar to an anchor sample (a positive sample) and data having characteristics different from those of the anchor sample (a negative sample). That is, the similar data is pulled such that the position of the feature value comes close, but the dissimilar data is pushed such that the position of the vector of the feature value becomes farther away.
For example, because a neural network for contrastive learning is trained with unlabeled data samples 210, 220, 230, and 240, as shown in
Also, a batch size, which is the size of a batch that is a unit of data processing, may be set large in order to improve the performance of contrastive learning, but when the batch size is set large, the batch may include a large number of data samples belonging to the same class as an anchor sample. However, in the existing contrastive learning method, neural network training is performed by calculating loss by considering such data samples of the same class (the transformed data samples 221 and 222 in the same class as the anchor sample 210) in the batch as belonging to a different class, as illustrated in
The present disclosure proposes a contrastive learning method that performs neural network learning by effectively extracting data of the same class as positive data (a positive sample), as illustrated in
Here, in contrastive learning, neural network parameters are set by performing repeated learning 300 to 1000 times (epochs) for a training dataset. Here, because a training dataset for each epoch typically has a huge size and is difficult to process in one go, it is divided into multiple batches, which are the units of neural network training, so as to be processed multiple times.
Also, in the method for contrastive learning of a neural network for automatic data labeling according to an embodiment of the present disclosure, a positive sample (positive data) is detected in the transformed data samples in consideration of the similarity between a single data sample for contrastive learning, among the input data samples, and each of the transformed data samples at step S120.
Here, a transformed data sample, the cosine similarity of which with the single data sample is equal to or greater than a preset reference value, may be detected as the positive sample.
Here, the positive sample may also include a transformed data sample belonging to the same kind of class as the single data sample depending on the cosine similarity.
Here, a transformed data sample belonging to the same kind of class as the single data sample and having the cosine similarity equal to or greater than the preset reference value may be classified as a positive sample.
Here, when a data sample is a transformed data sample for the single data sample but has cosine similarity less than the preset reference value, the data sample may be classified as a negative sample (negative data).
For example, data transformation may be performed by cropping a large image into small parts, as illustrated in
Considering this situation, the present disclosure detects positive samples based on cosine similarity such that the transformed data samples 221 and 222 illustrated in
Here, the lower the preset reference value for the cosine similarity, the more the positive samples that are detected, and the higher the preset reference value for the cosine similarity, the less the positive samples that are detected.
For example, the cosine similarity is the inner product of two vectors, and as the value thereof is closer to 1, the cosine similarity is determined to be higher, in which case the angle θ between the two vectors has a value close to 0. Accordingly, when the cosine similarity between two data samples transformed from a single data sample is calculated, the cosine similarity between the transformed data samples 211 and 212 illustrated in
Here, because data samples having features of a class corresponding to ‘DOG’, like the transformed data samples 211 and 212, have high cosine similarity, it may be determined that all of the two transformed data samples belong to the ‘DOG’ class. Accordingly, the cosine similarity is calculated also for the data samples 221, 222, 231, 232, 241, and 242, which are transformed from different data samples, and all of the transformed data samples, the cosine similarity of which satisfies the preset reference value (e.g., 0.98, θ=10), may be detected as positive samples.
That is, in the present disclosure, the range of positive data is set to each class for data classification, rather than being limited to each data sample, whereby positive data may be more clearly determined.
Here, the angle θ between the two vectors, which is a derived parameter, may be used as an important factor for determining how close two pieces of data are. According to an embodiment of the present disclosure, high performance is achieved when the cosine similarity is 0.97 (θ=13).
Also, in the method for contrastive learning of a neural network for automatic data labeling according to an embodiment of the present disclosure, contrastive learning of a neural network is performed based on the loss value of the positive sample at step S130.
Here, a total loss value is calculated based on the loss value of the positive sample, and the weights of the neural network are set such that the total loss value is minimized, whereby the neural network may be updated.
Here, the loss value of the positive sample may be calculated as shown in Equation (1) below:
Here, I is the number of transformed data samples, which is 2N, and P(i) is the number of samples that are determined to be positive samples based on cosine similarity. Also, k may indicate all of data samples, excluding positive samples transformed from an anchor sample in the transformed data samples, and the number thereof may be 2 (N−1). Also, t indicates the sensitivity when two pieces of data push or attract each other, and when t is close to 1, this may indicate that the sensitivity is low, but when t is close to 0, this may indicate that the sensitivity is high.
Here, according to the present disclosure, it is necessary to consider a change caused due to t because multiple positive samples can be included.
For example, the number of detected positive samples decreases as the reference value for the cosine similarity for determining positive samples is higher, and the number of detected positive samples increases as the reference value for the cosine similarity is lower. Therefore, it is necessary to set the correlation between the above-described angle θ between two vectors and t, which indicates the sensitivity when the two pieces of data are brought closer together or pushed farther apart, as an important parameter affecting the positive sample detection performance and to adjust the value thereof to a suitable value for performance improvement.
Through the above-described method for contrastive learning of a neural network for automatic data labeling, positive samples are more clearly determined. Accordingly, performance may be improved compared to the existing contrastive learning method, and this method may be applied to applications for automatic data labeling.
Also, the present disclosure may improve the prediction accuracy of a neural network through more accurate revision thereof.
First, the flowchart illustrated in
Also, because contrastive learning requires clustering of data samples having similar features by extracting data features from unlabeled data samples, huge amounts of training data are required, and ImageNet is commonly used. Here, because ImageNet includes 1.28 million images classified into 1000 classes, when a batch size is set to 256, contrastive learning may be performed by taking 5000 iterations by dividing 1.28 million images into batches having a size of 256. Also, assuming that the number of times repeated learning is performed is 300, 5000 iterations of the learning process have to be repeated 300 times. Therefore, steps S510 to S560 may be repeatedly performed a total of 1,500,000 times (300×5000=1,500,000).
Referring to
Subsequently, cosine similarity may be calculated for the 2N transformed data samples at step S530.
Subsequently, the partial loss for each transformed data sample may be calculated as an entropy value at step S540 by reflecting the cosine similarity value from a positive sample in a numerator and reflecting the cosine similarity value from a negative sample in a denominator.
Subsequently, a total loss value for the 2N transformed data samples is calculated at step S550, and a neural network may be updated so as to minimize the calculated total loss value at step S560.
However, because the existing method updates the neural network by calculating the loss on the assumption that, among the input data samples corresponding to the batch size (N), only one data sample is a positive sample, as described above, the transformed data samples 211 and 222 of the same kind of class that are input in the same batch are not classified as positive samples, as illustrated in
Accordingly, considering this situation, the present disclosure proposes a method that replaces step S540 illustrated in
That is, referring to steps S640 and S650 in
Referring to
Accordingly, an embodiment of the present disclosure may be implemented as a non-transitory computer-readable medium in which methods implemented using a computer or instructions executable in a computer are recorded. When the computer-readable instructions are executed by a processor, the computer-readable instructions may perform a method according to at least one aspect of the present disclosure.
The processor 710 generates transformed data samples for unlabeled input data samples corresponding to a batch size.
Also, the processor 710 detects a positive sample (positive data) in the transformed data samples in consideration of the similarity between a single data sample for contrastive learning, among the input data samples, and each of the transformed data samples.
Here, a transformed data sample, the cosine similarity of which with the single data sample is equal to or greater than a preset reference value, may be detected as a positive sample.
Here, the positive sample may also include a transformed data sample belonging to the same kind of class as the single data sample depending on the cosine similarity.
Here, a transformed data sample belonging to the same kind of class as the single data sample and having cosine similarity equal to or greater than the preset reference value may be classified as a positive sample.
Here, when a data sample is a transformed data sample for the single data sample but has cosine similarity less than the preset reference value, the data sample may be classified as a negative sample (negative data).
Here, the lower the preset reference value for the cosine similarity, the more the positive data samples that are detected, but the higher the preset reference value for the cosine similarity, the less the positive data samples that are detected.
Also, the processor 710 performs contrastive learning of a neural network based on a loss value of the positive sample.
Here, a total loss value is calculated based on the loss value of the positive sample, and the neural network may be updated by setting the weights of the neural network so as to minimize the total loss value.
The memory 730 stores the neural network.
Using the above-described apparatus for contrastive learning of a neural network for automatic data labeling, not only a data sample generated from an anchor sample but also a similar data sample belonging to the same kind of class as the anchor sample are reflected in calculation of a loss value, whereby accuracy may be improved when a neural network is updated.
Also, the predication accuracy of the neural network may be improved through more accurate revision thereof.
Also, the loss value is calculated by considering similar data belonging to the same kind of class as an anchor sample as a positive sample of the anchor sample, whereby the neural network may be updated more accurately.
According to the present disclosure, a loss value is calculated by reflecting not only a data sample generated from an anchor sample but also a similar data sample belonging to the same kind of class as the anchor sample, whereby accuracy may be improved when a neural network is updated.
Also, the present disclosure may improve the prediction accuracy of a neural network through more accurate revision thereof.
Also, the present disclosure calculates a loss value by considering similar data belonging to the same kind of class as an anchor sample as positive data, thereby performing more accurate update of a neural network.
As described above, the method for contrastive learning of a neural network for automatic data labeling and the apparatus for the same according to the present disclosure are not limitedly applied to the configurations and operations of the above-described embodiments, but all or some of the embodiments may be selectively combined and configured, so the embodiments may be modified in various ways.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0053239 | Apr 2023 | KR | national |