This application claims priority to and the benefit of Korean Patent Application Nos. 10-2023-0017184, Feb. 9, 2023, and 10-2023-0072979, Jun. 7, 2023, the disclosure of which is incorporated herein by reference in its entirety.
The present disclosure relates to a system and method for classifying novel class objects.
An object classification model according to the prior art predefines a set of base classes to be classified and performs object classification assuming that an input image contains only objects from these predefined classes. Hence, in order to apply such an object classification model to real-world applications where novel class objects appear, it is necessary to acquire a large amount of training data for these novel classes and train the object classification model using the training dataset, which requires a lot of time and costs.
A method of classifying novel class objects according to the prior art requires more training time since it is trained with only a limited amount of training data from a randomly initialized state. In addition, it often leads to reduced object classification performance due to overfitting or unexpected random effects from its initial values.
Various embodiments are directed to a system and method for classifying novel class objects, which are capable of setting up a novel classifier model by utilizing parameterization that leverages prior knowledge from a base classifier, thereby allowing the training process to be expedited and improving object classification performance compared to the prior art.
In accordance with an aspect of the present disclosure, there is provided a method of classifying novel class objects, which includes (a) constructing a novel classifier considering prior knowledge acquired from a base classifier, and (b) learning a parameterized weight coefficient of a novel classifier model during the training of the novel classifier.
In (a) above, a parameter of the base classifier previously learned for a set of base classes may be considered as the prior knowledge.
In (a) above, a preset number of Gaussian random vectors may be used as additional basis vectors to construct the novel classifier capable of expressing any novel class.
In (b) above, a parameter of the novel classifier model may be parameterized with the weight coefficient, and then the weight coefficient is updated to streamline the training of the novel classifier.
The method of classifying novel class objects may further include (c) performing object recognition for a novel class by applying it to an object recognition model. In (c) above, a feature map (hereinafter, referred to as “first feature map”) may be obtained through a backbone network by processing an input image, a feature map in a feature pyramid network (hereinafter, referred to as “FPN feature map”) may be obtained using the first feature map, and a result of object recognition in the input image may be output.
In (c) above, the FPN feature map may be obtained by attaching a convolutional layer to the first feature map or by merging a result of attaching the convolutional layer to the first feature map with an upsampled FPN feature map for calculation.
In (c) above, a classification head, a centerness head, a regression head, and a controller head associated with the FPN feature map may be used, where the controller head may be used to set a parameter of a mask head for object recognition, and the result of object recognition is output through an instance-wise mask head.
In (c) above, an objective function used for training may be constructed through a combination of an objective function to improve object classification performance, an objective function to find object centerness, an objective function for object bounding box regression, and an objective function for mask-based object recognition, and the object recognition model may be trained by fine-tuning the classification head, centerness head, regression head, and controller head of the object recognition model.
In accordance with another aspect of the present disclosure, there is provided a system for classifying novel class objects, which includes an input interface device configured to receive prior knowledge from a base classifier, a memory configured to store a program that constructs and trains a novel classifier by considering the prior knowledge of the base classifier, and a processor configured to execute the program. The program involves learning a parameterized weight coefficient of a novel classifier model during the training of the novel classifier.
The prior knowledge may be a parameter of the base classifier previously learned for a set of base classes.
The processor may use a preset number of Gaussian random vectors as additional basis vectors to construct the novel classifier.
The processor may involve the process of parameterizing a parameter of the novel classifier model with the weight coefficient.
The processor may apply a method for classifying novel class objects to an object recognition model, to obtain a first feature map through a backbone network by processing an input image, to obtain a feature pyramid network (FPN) feature map using the first feature map, and to output a result of object recognition in the input image.
The processor may attach a convolutional layer to the first feature map or merge a result of attaching the convolutional layer to the first feature map with an upsampled FPN feature map for calculation.
The processor may output the result of object recognition using a classification head, a centerness head, a regression head, and a controller head associated with the FPN feature map.
The processor may construct an objective function used for training through a combination of an objective function to improve object classification performance, an objective function to find object centerness, an objective function for object bounding box regression, and an objective function for mask-based object recognition, and perform object recognition model training by fine-tuning the classification head, centerness head, regression head, and controller head of the object recognition model.
The above and other objects, advantages, and features of the present disclosure and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings.
The present disclosure may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. The following embodiments are provided solely to facilitate the purpose, configuration and effect of the disclosure to those of ordinary skill in the art to which the present disclosure pertains, and the scope of the present disclosure is defined by the appended claims.
Meanwhile, the terms used herein are for the purpose of describing the embodiments and are not intended to limit the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless context clearly indicates otherwise. It will be understood that the terms “comprises”/“includes” and/or “comprising”/“including” when used in the specification, specify the presence of stated components, steps, motions, and/or elements, but do not preclude the presence or addition of one or more other components, steps, motions, and/or elements.
Object classification is one of the most fundamental task for understanding images in visual intelligence research. Object classification involves identifying the type of object contained an input image I and assigning a class (c∈C) to that object. Object classification is not only important in itself for understanding images, but is also essential for addressing other detailed image understanding issues such as object detection or instance segmentation, and has a significant impact on individual performance.
The process of object classification according to the prior art is as follows. A sophisticated artificial intelligence model for object classification and a training dataset D consisting of a large amount of training to be used when training the artificial intelligence model are prepared. In this case, each element of the training dataset D is composed of a pair (I, AI) of an image I and a result of object classification AI of I written by a person. For a predefined set of base classes CB to be classified, AI may be configured in different forms depending on visual intelligence applications including object classification.
For example, AI may include class information (c∈CB) of the object appearing in the image I. In another example, AI may be composed of a pair (ci, bi) (1≤i≤NI), each consisting of class information (ci ∈CB) of each of the NI objects appearing in the image I and bounding-box-level information (bi) indicating the location of each object. In a further example, AI may be composed of a pair (ci, mi) (1≤i≤NI), each consisting of class information (ci ∈CB) of each of the NI objects appearing in the image I and pixel-level information (mi) indicating the location of each object.
In the training stage of an initial artificial intelligence model, the artificial intelligence model is updated repeatedly so that the result of object classification output by the model is close to the result of object classification AI written by a person for each image I of the training dataset D. Using the artificial intelligence model trained in this way, object classification is performed on any image in the actual testing stage.
An object classification model according to the prior art predefines a set of base classes (CB) to be classified, and then performs object classification assuming that an input image I contains only objects of the class belonging to CB.
However, a large number of novel class objects may appear in many real-world applications that require visual intelligence technology. In particular, these novel classes may vary depending on the specific application of the visual intelligence model.
Hence, in order to use a conventional object classification method, it is necessary to acquire a large amount of training data for these novel classes and train the object classification model for each specific application, which inevitably takes a lot of time and costs.
To solve the aforementioned problem, there is proposed a method of adding a novel classifier for classifying objects of the class belonging to a set of novel classes CN to quickly train the novel classifier, in addition to a base classifier for classifying objects of the class belonging to a predefined set of base classes CB.
The representative method thereof is to utilize low-shot learning, which involves training with a very small number of examples. In the low-shot learning, the model is trained using 30 or fewer examples for each class in the set of novel classes CN.
The process of classifying novel class objects using low-shot learning as follows. First, a base classifier that classifies objects of the class belonging to a set of base classes CB is trained in the same way as training the conventional object classification model. Next, in the trained object classification model, a randomly initialized novel classifier that classifies objects of the class belonging to a set of novel classes CN is added and then trained using a training dataset for the novel classes. In this case, training only the novel classifier without updating the object classification model may exhibit better performance in most cases.
The method described above for classifying novel class objects requires more training time since it is trained with only a limited amount of training data from a randomly initialized state. In addition, it may lead to reduced object classification performance due to overfitting or unexpected random effects from its initial values.
In order to solve the aforementioned problem, the present disclosure proposes a method of constructing a novel classifier model by utilizing prior knowledge acquired from a base classifier that is trained with a large amount of training data.
According to the embodiment of the present disclosure, a parameter θN of a novel classifier is modeled by incorporating a parameter θB of a base classifier, which has been previously learned for a set of base classes (CB) as prior knowledge.
The parameter θN of the novel classifier according to the embodiment of the present disclosure is defined as Equation 1 below.
θB ∈d×|C
According to the embodiment of the present disclosure, the novel classifier is trained by parameterizing the parameter θN of the novel classifier model with the weight coefficient α and updating the weight coefficient α.
The novel classifier according to the embodiment of the present disclosure is modeled by a linear combination of the parameters of the base classifier, and the novel classifier is constructed by expressing the novel class through an appropriate combination of prior knowledge of the base classifier.
The dimension d of each classifier has a large value such as 256 or 512, while the number of base classifiers, namely, the number of base classes |CB|, has a relatively small value. For example, for an MS-COCO dataset, the number of base classes |CB| used in typical low-shot learning is 60.
As such, if the number of base classifiers |CB| is less than the dimension d of the classifier, the subspace that may be created by the base classifiers is inherently limited. This means that a large null space is created that may not be expressed with only the base classifier. The existence of such a null space implies that all novel classes may not be expressed exclusively with prior knowledge of the base classifier.
In order to overcome the above limitations, the present disclosure constructs a novel classifier that can express any novel class by utilizing r Gaussian random vectors R as additional basis vectors, which enhances high expressive power for novel classes, thereby leading to improved object classification performance compared to conventional object classification methods.
When training the novel classifier according to the embodiment of the present disclosure, the parameterized weight coefficient α is learned, rather than all of the parameters θN of the novel classifier model.
Compared to the prior art where the number of parameters to be learned for the novel classifier is d|CN|, the number of parameters to be learned for the novel classifier according to the embodiment of the present disclosure is (|CB|+r)×|CN|.
Usually, the dimension d of the classifier has a large value, while |CB|+r has a smaller value. Hence, the number of parameters to be learned according to the embodiment of the present disclosure is significantly reduced.
Therefore, according to the embodiment of the present disclosure, the training of the novel classifier can proceed more quickly, and efficient training is possible without overfitting in situations where the novel classifier is to be trained with a small number of training data, such as in low-shot learning settings.
The method of classifying novel class objects according to the embodiment of the present disclosure may be applied to the object recognition model illustrated in
When an input image is entered, the object recognition model illustrated in
The feature maps P3, P4, P5, P6, and P7 in the feature pyramid network (FPN) may be obtained using C3, C4, and C5.
In this case, P5 is calculated by attaching a 1×1 convolutional layer to C5, P3 and P4 are calculated by attaching 1×1 convolutional layers to C3 and C4 and then merging them with results of upsampling P4 and P5, respectively, and P6 and P7 are calculated by attaching 1×1 convolutional layers to P5 and P6, respectively.
For the feature maps P3, P4, P5, P6, and P7, there are a classification head for separate object classification, a centerness head for finding object centerness, a regression head for bounding box regression, and a controller head for object recognition.
The controller head may be used to set a mask head parameter for object recognition.
The result of upsampling the feature maps P4 and P5 and adding them to P3 passes through several convolutional layers and an instance-wise mask head to output the result of recognizing each object appearing in a corresponding image.
The training process of the object recognition model illustrated in
After the object recognition model is trained for the base class, the classification head, centerness head, regression head, and controller head of the object recognition model are fine-tuned using the training dataset consisting of the set of novel classes CN to recognize novel class objects.
The training method of the object recognition model according to the embodiment of the present disclosure may be changed in detail depending on the training dataset, the training scope (training the entire model or only the head part, etc.), and the training method (supervised/unsupervised/weakly supervised training, etc.).
Referring to
The method of classifying novel class objects according to the embodiment of the present disclosure includes the steps of: constructing a novel classifier model considering prior knowledge acquired from a base classifier (S310) and learning a parameterized weight coefficient of a novel classifier model during the training of the novel classifier (S320).
In step S310, a parameter of the base classifier previously learned for a set of base classes is considered as the prior knowledge.
In step S310, a preset number of Gaussian random vectors is used as additional basis vectors to construct the novel classifier capable of expressing any novel class.
In step S320, a parameter of the novel classifier model is parameterized with the weight coefficient and the weight coefficient is updated to streamline the training of the novel classifier.
The method of classifying novel class objects according to the embodiment of the present disclosure further includes the step of performing object recognition for a novel class by applying it to an object recognition model (S330). In step S330, a feature map (hereinafter, referred to as “first feature map”) is obtained through a backbone network by processing an input image, a feature map in a feature pyramid network (hereinafter, referred to as “FPN feature map”) is obtained using the first feature map, and a result of object recognition in the input image is output.
In step S330, the FPN feature map is obtained by attaching a convolutional layer to the first feature map or by merging a result of attaching the convolutional layer to the first feature map with an upsampled FPN feature map for calculation.
In step S330, a classification head, a centerness head, a regression head, and a controller head associated with the FPN feature map are used, where the controller head is used to set a parameter of a mask head for object recognition, and the result of object recognition is output through an instance-wise mask head.
In step S330, an objective function used for training is constructed through a combination of an objective function to improve object classification performance, an objective function to find object centerness, an objective function for object bounding box regression, and an objective function for mask-based object recognition, and the object recognition model is trained by fine-tuning the classification head, centerness head, regression head, and controller head of the object recognition model.
A system for classifying novel class objects according to an embodiment of the present disclosure includes an input interface device that receives prior knowledge from a base classifier, a memory storing a program that constructs and trains a novel classifier by considering the prior knowledge of the base classifier, and a processor that executes the program, which involves learning a parameterized weight coefficient of a novel classifier model during the training of the novel classifier.
The prior knowledge is a parameter of the base classifier previously learned for a set of base classes.
The processor uses a preset number of Gaussian random vectors as additional basis vectors to construct the novel classifier.
The processor involves the process of parameterizing a parameter of the novel classifier model with the weight coefficient.
The processor applies a method for classifying novel class objects to an object recognition model, to obtain a first feature map through a backbone network by processing an input image, to obtain a feature pyramid network (FPN) feature map using the first feature map, and to output a result of object recognition in the input image.
The processor attaches a convolutional layer to the first feature map or merges a result of attaching the convolutional layer to the first feature map with an upsampled FPN feature map for calculation.
The processor outputs the result of object recognition using a classification head, a centerness head, a regression head, and a controller head associated with the FPN feature map.
The processor constructs an objective function used for training through a combination of an objective function to improve object classification performance, an objective function to find object centerness, an objective function for object bounding box regression, and an objective function for mask-based object recognition, and performs object recognition model training by fine-tuning the classification head, centerness head, regression head, and controller head of the object recognition model.
Referring to
Accordingly, the embodiment of the present disclosure may be embodied by a computer-implemented method or by a non-transitory computer-readable medium storing computer-executable instructions. In an embodiment, computer readable instructions may perform, when executed by a processor, a method according to at least one aspect of the present disclosure.
The communication device 1020 may transmit or receive a wired signal or a wireless signal.
In addition, the method according to the embodiment of the present disclosure may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium.
The computer-readable medium may include program instructions, data files, data structures, etc., alone or in combination. The program instructions recorded on the computer-readable medium may be specially designed and configured for embodiments of the present disclosure, or may be known and usable by those skilled in the art of computer software. The computer-readable recording medium may include a hardware device configured to store and perform program instructions. For example, The computer-readable recording media may be a magnetic medium such as hard disk, floppy disk, or magnetic tape, an optical medium such as CD-ROM or DVD, a magneto-optical medium such as floptical disk, ROM, RAM, flash memory, etc. The program instruction may include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer through an interpreter or the like.
Each row in the visualization represents a base class and each column represents a novel class. Each element means a normalized value of the weight coefficient of the base class corresponding to each row in the novel classifier corresponding to each column.
In other words, each element indicates how much of the classifier of each base class is used when the novel classifier is constructed. It appears closer to 1.00 as the value of the corresponding element increases, and it appears closer to −1.00 as the value of the corresponding element decreases.
Referring to
Referring to
Referring to
Since the low-shot learning-based object recognition model is trained with only a limited amount of training data, the novel class object classification result may be inevitably degraded in overall classification performance.
The left in
As can be seen in the box area at the bottom right, according to the prior art, even though an object clearly exists in the input image, the object is not classified for the corresponding novel class, which may not be output as an object recognition result.
The right in
As apparent from the above description, according to the present disclosure, it is possible to speed up training and improve object classification performance even with a limited amount of training data, by parameterizing the novel classifier using the prior knowledge of the pre-trained base classifier for classifying base class objects when attempting to perform the task of novel class object classification using the artificial intelligence model such as a deep artificial neural network.
The present disclosure is not limited to the above effect, and other effects of the present disclosure will be clearly understood by those skilled in the art from the above description.
Although the specific embodiments have been described with reference to the drawings, the present disclosure is not limited thereto. It will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the disclosure as defined in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0017184 | Feb 2023 | KR | national |
10-2023-0072979 | Jun 2023 | KR | national |