Deep neural networks have enormous potential for understanding natural images. The learning ability of deep neural networks increases significantly with more labeled training data. However, annotating such data is expensive, time-consuming and laborious. Furthermore, some classes (e.g., in medical images) are naturally rare and hard to collect. The conventional training approaches for deep neural networks often fail to obtain good performance when the training data is insufficient. Considering that humans can easily learn from very few examples and even generalize to many different new images, it will be greatly helpful if the network can also learn to generalize to new classes with only a few labeled samples from unseen classes.
Known methods for few-shot learning can generally fall into one of two categories. One is the meta-based methods that model the few-shot learning process with samples belonging to the base classes and optimize the model for the target novel classes. The other is the plain solution (non-meta-based, also known as the baseline method) that trains a feature extractor from abundant base class then directly predicts the weights of the classifier for the novel ones.
As the number of images in the support set of novel classes are extremely limited, directly training models from scratch on the support set is unstable and tends to be overfitting. Even utilizing the pre-trained parameters on base classes and fine-tuning all layers on the support set leads to poor performance due to the small proportion of target training data.
A common practice utilized by either meta-based or simple baseline methods relies heavily on the pre-trained knowledge with the sufficient base classes, and then transfers the representation by freezing the backbone parameters and solely fine-tuning the last fully-connected layer or directly extracting features for distance computation on the support data, to prevent overfitting and improve generalization. However, as the base classes have no overlap with the novel ones, meaning that the representation and distribution required to recognize images are quite different between them, completely freezing the backbone network and simply transferring the whole knowledge will suffer from this discrepant domain issue.
Disclosed herein is a method which utilizes a flexible way to transfer knowledge from base to novel classes. The invention introduces a partial transfer paradigm for the few-shot classification task, shown schematically in
During searching, the validation data will be commandeered as the ground-truth to monitor the performance of the search strategy. This strategy can achieve a better trade-off of using knowledge from base and support data than previous approaches while avoiding incorporating biased or harmful knowledge from base classes into novel classes. Moreover, the disclosed method is orthogonal to meta-learning or non-meta-based solutions, and thus can be seamlessly integrated with them.
The novel aspects of the invention can be summarized as follows: First, disclosed herein is Partial Transfer (P-Transfer) for the few-shot classification, a framework that enables to search transfer strategies on backbone for flexible fine-tuning. The conventional fixed transferring is a special case of the disclosed strategy when all layers are frozen. Second, disclosed herein is a layer-wise search space for fine-tuning from base classes to novel, which helps the searched transfer strategy obtain inspiring accuracies under limited searching complexity.
By way of example, a specific exemplary embodiment of the disclosed system and method will now be described, with reference to the accompanying drawings, in which:
The method, referred to herein as P-Transfer, for partial few shot learning will now be disclosed with reference to
In the few-shot classification task, given abundant labeled images Xb in base classes Lb and a small proportion of labeled images Xn in novel classes Ln, wherein Lb∩Ln=0, the goal is to train models for recognizing novel classes with the labeled large amount of base data and limited novel data. Considering an N-way K-shot few-shot task, where the support set on novel class has N classes with K labeled images and the query set contains the same N classes with Q unlabeled images in each class, the few-shot classification algorithms are required to learn classifiers for recognizing the N×Q images in the query set of N classes.
The objective of P-Transfer is to discover the best transfer learning scheme V*lr, such that the network achieves maximal accuracy when fine-tuning under that scheme:
V*
lr=argmaxcc(W,Vlr) (1)
As shown in
Step 1: Base Class Pre-Training—Base class pre-training is the fundamental step of the pipeline. As shown in
Step 2: Evolutionary Search. The second step is to perform evolutionary search with different fine-tuning strategies to determine which layers will be fixed and which layers will be fine-tuned in the representation transfer stage. Simple baseline through pre-training+fine-tuning, and meta-based methods are considered. In these two scenarios the evolutionary searching operations are slightly different, as shown in
Generally, the method searches the optimal strategy for transferring from base classes to novel classes through fixing or re-activating some particular layers that can help novel classes.
Step 3: Partial Transfer via Searched Strategy—As shown in
The search space is related to the model architecture utilized for the few-shot classification. Generally, it contains the layer-level selection (fine-tuning or freezing) and learning rate assignment for fine-tuning. The search space can be formulated as mK, where m is the number of choices for learning rate values and K is the number of layers in networks. For example, learning rate ∈{0, 0, 01, 0.1, 1.0} could be chosen as the learning rate zoo (i.e., m=4) wherein a learning rate of 0 indicates this layer is frozen during fine-tuning. For example, for a Conv6 structure, the search space includes 46 possible transfer strategies. The searching method can automatically match the optimal choice for each layer from the learning rate zoo during fine-tuning. A brief comparison of the search space is shown in Table 1. It increases sharply if deeper networks are chosen.
The searching step follows the evolutionary algorithm. Evolutionary algorithms (a.k.a genetic algorithms), are based on the natural evolution of creature species. It contains reproduction, crossover (swapping parts of the elements of the learning strategy vectors), and mutation (flipping some elements of the learning strategy vectors) stages. Here, first a population of strategies is embedded to vectors and initialized randomly. Each individual consists of its strategy for fine-tuning. After initialization, each individual strategy is evaluated to obtain its accuracy on the validation set. Among these evaluated strategies, the top K are selected as parents to produce posterity strategies. The next generation strategies are made by mutation and crossover stages. By repeating this process in iterations, a best fine-tuning strategy with the best validation performance can be discovered. One embodiment of a detailed search pipeline is presented in
As shown in
For Use With Simple Baseline++ Methods—Baseline++ methods aim to explicitly reduce intra-class variation among features by applying cosine distances between the feature and weight vector in the training and fine-tuning stages. As shown in
For Use With Meta-Learning-Based Methods—
In few-shot learning, the pre-trained feature extractor is required to provide proper transferability from base classes to one or more novel classes in the meta or non-meta learning stage. The transferring of the learning aims to transfer the common knowledge from base objects to the novel class. However, as discussed, there may be some unnecessary and even harmful information in the base class. Because the novel data is few and sensitive to the feature extractor, the complete transferring strategy will not be able to avoid the unnecessary and harmful information, indicating that method disclosed herein is a better solution for the few-shot scenario.
Usually, the base and novel class are in the same domain, so using the pre-trained feature extractor on base data and then transferring to novel data can obtain good or moderate performance. However, in the cross-domain transfer-learning, more layers need to be fine-tuned to adapt the knowledge for the target domain since the source and target domains are discrepant in content. In this circumstance, the conventional transfer learning is no longer applicable. The disclosed method of partial transferring with diverse learning rates on different layers is competent for this intractable situation, and intuitively, fixed transferring is generally a special case of our strategy and ours has better potential in few-shot learning.
Disclosed herein is a partial transfer (P-Transfer) method for the few-shot classification. The method transfers knowledge from base classes to novel classes through searching strategies in few-shot scenarios without any proxy. The method boosts both the meta and non-meta based methods by a large margin as the flexible transfer/fine-tuning benefits from few support samples to adjust the backbone parameters. Intuitively, the P-transfer method has larger potential for few-shot classification and even for traditional transfer learning.
As would be realized by one of skill in the art, the methods described herein can be implemented by a system comprising a processor and memory, storing software that, when executed by the processor, performs the functions comprising the method.
This application claims the benefit of U.S. Provisional Patent Application No. 63/146,274, filed Feb. 5, 2021, the contents of which are incorporated herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US22/13495 | 1/24/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63146274 | Feb 2021 | US |