The present disclosure relates to a field of artificial intelligence, and in particular to a method of generating a classifier by using a small number of labeled images.
Image classification is an image processing method that distinguishes different classes of objects according to their different features reflected in image information. Instead of human visual interpretation, the image classification uses a computer to perform a quantitative analysis on an image, and classifies the image or each pixel or region in the image into one of several classes.
At present, deep neural network-based classification methods are very mature, but most methods rely on massive amounts of labeled data. Furthermore, when the class of the image changes, these classification methods cannot adjust quickly, thereby affecting an effect of the image classification.
An objective of the present disclosure is to provide a method of generating a classifier by using a small number of labeled images, so as to ensure accuracy of image classification.
In an aspect, the objective of the present disclosure is achieved by the following method.
There is provided a method of generating a classifier by using a small number of labeled images, including:
In another aspect, the objective of the present disclosure is achieved by the following computer device.
There is provided a computer device of generating a classifier by using a small number of labeled images, including:
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the accompanying drawings required in the description of the embodiments are briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present disclosure. For those ordinary skilled in the art, other drawings may be obtained from these drawings without carrying out any inventive effort.
The technical solutions of the present disclosure are clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. Obviously, the embodiments described are only a part but not all of the embodiments of the present disclosure. Based on the embodiments in the present disclosure, all other embodiments obtained by those ordinary skilled in the art without making inventive efforts fall within the protection scope of the present disclosure.
The embodiments of the present disclosure provide a method of generating a classifier by using a small number of labeled images.
In step S100, a wide residual network is pre-trained by using a set of labeled data with a data amount meeting requirements, and portions of the pre-trained wide residual network except for a fully connected layer are determined as a feature extractor 200 for an image.
In the embodiments of the present disclosure, the step S100 may include: at step S1001, a set of labeled data with a data amount meeting requirements is selected, and the set of labeled data is divided, according to image classes, into a training set and a test set that do not overlap each other; at step S1002, the wide residual network is trained for a predetermined number of times by using the training set; and at step S1003, the trained wide residual network is tested by using the test set.
The wide residual network includes a multi-layer convolutional neural network and a fully connected layer. In the pre-training process, after each image is input into the wide residual network, an output of the fully connected layer at the end of the wide residual network indicates a classification score of the input image being classified into each class.
In the pre-training process, a loss function is defined as:
where si,y indicates a classification score of an ith image to be classified being classified into a true class y in each batch training, and si,y′, indicates a classification score of the ith image being classified into the other class y′.
After a certain number of training and passing the test of the test set, the pre-training of the wide residual network is completed.
When the pre-training is completed, portions of the wide residual network except for the fully connected layer are retained as a feature extractor 200 for the image.
In step S102, for an N-class classifier 206 to be generated, N classes are randomly selected from the training set for each of a plurality of times. For N classes selected each time, step S104 to step S106 are performed.
In step S104, one or more images are randomly selected from each class of the N classes as training samples.
In step S106, a feature vector is extracted from the training samples of each class by using the feature extractor 200.
In the embodiments of the present disclosure, step S106 may further include one of step S1061 and step S1062.
At step S1061, if an image is extracted from each class as a training sample, a feature vector is extracted from each training sample, so that a total of N feature vectors are finally extracted for N classes.
At step S1062, if a plurality of images are extracted from each class as training samples, a plurality of feature vectors are extracted from a plurality of training samples of each class, and an average of the plurality of feature vectors are determined as the feature vector for the class, so that a total of N feature vectors are finally extracted for N classes.
In step 5108, a total of N feature vectors extracted are input into a classifier generator.
In the embodiments of the present disclosure, a specific value of N may be set according to requirements, and a number of the images selected from each class may also be set according to experience or requirements.
In step S110, a class information fusion and a parameter prediction for the N-class classifier are sequentially performed by the classifier generator. After step S104 to step S100 are performed for each randomly selected N classes, the N-class classifier 206 is obtained.
The classifier generator in the embodiments of the present disclosure may include a class information fusion module 202 and a classifier parameter prediction module 204.
In the embodiments of the present disclosure, step S108 may further include: at S1081, the feature vectors for the N classes are stitched to form a matrix with N rows; and at S1082, the matrix is input into the class information fusion module 202 to obtain a fusion feature matrix. Each row of the fusion feature matrix indicates a class feature for a corresponding row of the input matrix.
The class information fusion module 202 includes a fully connected layer having N input dimensions and N output dimensions.
In the embodiments of the present disclosure, step S110 may further include: at S1101, the fusion feature matrix is input into the classifier parameter prediction module 204 to predict a parameter of the N-class classifier 206.
The classifier parameter prediction module 204 includes a fully connected layer having input and output dimensions same as dimension of the feature vector of the image.
For example, if the feature extractor 200 may obtain a 640-dimensional vector for each input image, the classifier parameter prediction module 204 may have 640-dimensional input and output. The classifier parameter prediction module 204 may predict the parameter of the classifier 206 according to the output of the class information fusion module 202. For example, an N×640-dimensional matrix may be obtained according to the previous assumption. The matrix is the final classifier parameter, a 640-dimensional image feature is finally input into the N classifier 206, and an N-dimensional classification score is obtained. The class with the highest score is determined as a predicted class.
In addition, in the embodiments of the present disclosure, the method shown in
A loss function used in the training process of the N-class classifier is as follows:
where Si,y indicates a classification score of an ith image to be classified being classified into a true class y in each batch training, and si,y′, indicates a classification score of the ith image being classified into the other class y′. The loss function is the same as that used in the pre-training of the wide residual network except for the number of the image classes involved.
In addition, when the training is completed, N new classes and one or more new samples of each class are given, and a new classifier may be directly generated to classify images of these N new classes. In particular, it may use only a sample for each new class during the training. In practice, if there are a plurality of available samples in a class, an average of the feature vectors of these samples may be used instead of the feature vector of a single sample.
The above solution according to the embodiments of the present disclosure is completely based on a 2D convolutional neural network. According to this solution, after training on a large training set, a classifier identifying new classes may be generated based on a small number of samples of the new classes. During the testing on a dedicated few-sample learning dataset, for the generation of a 5-class classifier (that is, N=5), when only one sample is used for each new class, the classification accuracy of the generated classifier may reach 60.04%, and when five samples are used for each new class, the classification accuracy may reach 74.15%.
Through the description of the above embodiments, those skilled in the art may clearly understand that the above embodiments may be implemented by software, or may be implemented by means of software with a necessary general hardware platform. Based on this understanding, the technical solutions of the above embodiments may be embodied in the form of a software product. The software product may be stored in a non-volatile storage medium (which may be a CD-ROM, U disk, mobile hard disk, etc.). The non-volatile storage medium includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in various embodiments of the present disclosure.
The above-mentioned technical solutions of the present disclosure provide a method of generating a classifier. When classifying images of a new class, a new classifier may be generated by using a small number of images in the new class, so as to ensure the accuracy of image classification.
The above are only preferred specific implementations of the present disclosure, and the scope of protection of the present disclosure is not limited thereto. Any changes or substitutions that may be easily conceived by those skilled in the art within the technical scope disclosed in the present disclosure should be covered by the scope of protection of the present disclosure. Therefore, the scope of protection of the present disclosure should be determined by the scope of protection defined by the claims.
Number | Date | Country | Kind |
---|---|---|---|
201910235392.2 | Mar 2019 | CN | national |
The present disclosure is a Section 371 National Stage Application of PCT International Application No. PCT/CN2020/079018, filed on Mar. 12, 2020, entitled “METHOD OF GENERATING CLASSIFIER BY USING SMALL NUMBER OF LABELED IMAGES”, and the PCT International application claims priority to Chinese Patent Application No. 201910235392.2, filed on Mar. 26, 2019, which are incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2020/079018 | 3/12/2020 | WO | 00 |