A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present disclosure relates generally to machine learning models and neural networks, and more specifically, to noise-resistant object detection with noisy annotations.
Artificial intelligence, implemented with neural networks and deep learning models, has demonstrated great promise as a technique for automatically analyzing real-world information with human-like accuracy. Image processing, including the detection of various objects within an image, is one class of problems to which neural networks may be applied. Training deep object detectors of a neural network or deep learning model typically requires significant human labeling effort to develop a high-quality training set, in particular, by manually identifying objects in various images with respective bounding boxes and labeling each object appropriately. Noisy annotations are more easily accessible or obtained, but they are detrimental for learning.
In the figures, elements having the same designations have the same or similar functions.
This description and the accompanying drawings that illustrate aspects, embodiments, implementations, or applications should not be taken as limiting—the claims define the protected invention. Various mechanical, compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of this description and the claims. In some instances, well-known circuits, structures, or techniques have not been shown or described in detail as these are known to one skilled in the art Like numbers in two or more figures represent the same or similar elements.
In this description, specific details are set forth describing some embodiments consistent with the present disclosure. Numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are meant to be illustrative but not limiting. One skilled in the art may realize other elements that, although not specifically described here, are within the scope and the spirit of this disclosure. In addition, to avoid unnecessary repetition, one or more features shown and described in association with one embodiment may be incorporated into other embodiments unless specifically described otherwise or if the one or more features would make an embodiment non-functional.
Artificial intelligence, implemented with neural networks and deep learning models, has demonstrated great promise as a technique for automatically analyzing real-world information with human-like accuracy. In general, such neural network and deep learning models receive input information and make predictions based on the same. Whereas other approaches to analyzing real-world information may involve hard-coded processes, statistical analysis, and/or the like, neural networks learn to make predictions gradually, by a process of trial and error, using a machine learning process. A given neural network model may be trained using a large number of training examples, proceeding iteratively until the neural network model begins to consistently make similar inferences from the training examples that a human might make. Neural network models have been shown to outperform and/or have the potential to outperform other computing techniques in a number of applications.
Image processing, including the detection of various objects within an image, is one class of problems to which neural networks may be applied. Training deep object detectors of a neural network or deep learning model typically requires significant human labeling effort to develop a high-quality training set, in particular, by manually identifying objects in various images with respective bounding boxes and labeling each object appropriately. A bounding box can define a region or area of an image associated with an object (i.e., the area of an image in which an object is found). A label can be a description for the object. Noisy annotations are more easily accessible or obtained, but they are detrimental for learning.
The present disclosure provides systems and methods for addressing the problem of training object detectors with a mixture of label noise and bounding box noise. According to some embodiments, a learning framework is provided which jointly optimizes object labels, bounding box coordinates, and model parameters by performing alternating noise correction and model training. In some embodiments, to disentangle label noise and bounding box noise, a two-step noise correction method is employed. In some examples, the first step performs class-agnostic bounding box correction by minimizing classifier discrepancy and maximizing region objectness. In some examples, the second step uses dual detection heads for label correction and class-specific bounding box refinement. Experiments have shown that the systems and methods of the present disclosure achieve state-of-the-art performance by effectively cleaning the annotation noise.
As used herein, the term “network” may comprise any hardware or software-based framework that includes any artificial intelligence network or system, neural network or system and/or any training or learning models implemented thereon or therewith.
As used herein, the term “module” may comprise hardware or software-based framework that performs one or more functions. In some embodiments, the module may be implemented on one or more neural networks.
Computing Device
According to some embodiments, the systems of the present disclosure—including the various networks, models, and modules—can be implemented in one or more computing devices.
Memory 120 may be used to store software executed by computing device 100 and/or one or more data structures used during operation of computing device 100. Memory 120 may include one or more types of machine readable media. Some common forms of machine readable media may include floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
Processor 110 and/or memory 120 may be arranged in any suitable physical arrangement. In some embodiments, processor 110 and/or memory 120 may be implemented on a same board, in a same package (e.g., system-in-package), on a same chip (e.g., system-on-chip), and/or the like. In some embodiments, processor 110 and/or memory 120 may include distributed, virtualized, and/or containerized computing resources. Consistent with such embodiments, processor 110 and/or memory 120 may be located in one or more data centers and/or cloud computing facilities.
As shown, memory 120 includes a noise correction module 130 that may be used to implement and/or emulate the neural network systems and models described further herein and/or to implement any of the methods described further herein. Noise correction module 130 may be used, in some examples, for performing alternating noise correction in one or more annotated images and model training using the same. In some embodiments, noise correction module 130 may include a bounding box correction module 140 and a label correction module 150. In some embodiments, computing device 100 implements, provides, or supports a learning framework or approach which jointly optimizes object labels, bounding box coordinates, and model parameters by performing alternating noise correction and model training. In some embodiments, in this framework, bounding box correction module 140 performs class-agnostic bounding box correction of noisy image data by minimizing classifier discrepancy and maximizing region objectness, and label correction module 150 uses dual detection heads for label correction of the noisy image data and class-specific bounding box refinement.
In some examples, memory 120 may include non-transitory, tangible, machine readable media that includes executable code that when run by one or more processors (e.g., processor 110) may cause the one or more processors to perform the methods described in further detail herein. In some examples, noise correction module 130, bounding box correction module 140, and/or label correction module 150 may be implemented using hardware, software, and/or a combination of hardware and software. As shown, computing device 100 receives input 160, which is provided to noise correction module 130. Input 160 may include a training dataset with images and noisy annotations. Noise correction module 130 may generate output 170, which can include corrected annotations.
Learning Framework or Approach for Object Detection with Noisy Annotations
According to some embodiments, systems and methods implement a robust learning framework or approach for object detection with noise annotations.
One or more of the processes of method 300 may be implemented, at least in part, in the form of executable code stored on non-transitory, tangible, machine-readable media that when run by one or more processors may cause the one or more processors (e.g., processor 110) of a computing device (e.g., computing device 100) to perform one or more of the processes. Some common forms of machine readable media that may include the processes of method 300 are, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
Referring to
The neural model Θ includes or uses an object detector, which in some embodiments, can be implemented with a Region-based Convolutional Neural Network (R-CNN), such as, for example, Faster-RCNN (as described in more detail in Ren et al., “Faster R-CNN: towards real-time object detection with region proposal networks,” In NIPS, pages 91-99 (2015), which is incorporated by reference herein). In some embodiments, the object detector can be a multi-stage object detector. In some embodiments, the multi-stage object detector includes a backbone feature extractor 220 with parameters θcnn, a Region Proposal Network (RPN) 230 with parameters θrpn, and one or more detection heads 240a, 240b.
In some embodiments, the RPN 230 takes an image (of any size) as input and outputs a set of rectangular object proposals 250 (for bounding boxes), each with an objectness score. “Objectness” measures membership to a set of object classes versus background.
In some embodiments, each detection head 240a, 240b may include a classification head with parameters θc, and a bounding box (bbox) regression head with parameters θb. The bbox regression head may generate a prediction for a bounding box of an object in an image, and the classification head may generate a prediction for a classification or label to assign to the object in the bounding box. In some embodiments, the classification head θc and bbox regression head θb of a detection head 240 have shared layers. Let detection head 240 with parameters θd denote the union of the classification head θc and the bbox regression head θb.
In the framework 200, the model Θ is first warmed-up, by training the object detector using the original noisy annotations Y and B. After the warm-up, the framework 200 alternatingly performs optimization on the annotations and training of the model. In some embodiments, this alternating noise correction and model training is performed in mini-batches of data, over multiple iterations (e.g., up to Maxlters). Specifically, at a process 320 of method 300, for each mini-batch of data X={x}, Y={y}, B={b}, the framework 200 first keeps model Θ fixed and performs noise correction to update Y and B. Then, at a process 330 of method 300, the framework 200 uses the corrected annotations to update the model Θ.
During training, framework 200 can simultaneously train the two detection heads θd1={θc1, θb1} and θd2={θc2, θb2}, which, in some embodiments, are kept diverged from each other by different parameter initializations and different training instance (i.e., RoI) sampling. The dual detection heads—each comprising a classification head θc and bbox regression head θb—are utilized to correct annotation noise (e.g., label noise, or bounding box (bbox) noise).
Due to the entanglement of an unknown mixture of label noise and bbox noise, it can be difficult to correct both types of annotation noise in a single step. Thus, according to some embodiments, the system or framework 200 implements a two-step noise correction method. In the first step, framework 200 performs class-agnostic bounding box correction (CA-BBC), which disentangles bbox noise and label noise by directly optimizing the noisy GT boxes regardless of their class labels. In the second step, framework 200 utilizes the outputs from dual detection heads for label noise correction and class specific bbox refinement. The updated annotations may then be used to train the neural network model Θ, in particular, for detecting various objects within images.
Class-Agnostic Bounding Box Correction
In a first step or process 322 of method 300, the framework 200 performs class-agnostic bounding box correction (CA-BBC). In particular, in some embodiments, framework 200 corrects bounding box noise by updating B→B* regardless of the label noise in Y. In this way, the operation is class-agnostic, as it does not consider the class or label assigned to the object. In some embodiments, CA-BBC is performed or implemented by bounding box correction module 140.
Class-agnostic bounding box correction is illustrated, for example, with reference to
Specifically, in some embodiments, given an image x∈X, the backbone feature extractor 220 first extracts a convolutional feature map. For each noisy ground truth (GT) bounding box b∈B, framework 400 performs a Region of Interest (RoI)-Pooling operation on the feature map to extract a fixed-sized feature ϕ(x, b*). The extracted RoI feature is provided to the two classification heads 410a, 410b to produce two sets of softmax predictions over C+1 classes (including the background class), p1(ϕ(x, b); θc1) and p2 (ϕ(x, b); θc2). For simplicity, these predictions of class can be denoted as p1 and p2. The discrepancy between the two predictions p1, p2 is defined as their L2 distance:
(p1,p2)=∥p1−p2∥22. (1)
Minimizing the classifier discrepancy (p1, p2) with respect to the bounding box 440 will push the bounding box to a region where the two classifiers 410a-b agree on its class label. To prevent the bounding box 440 from simply moving to a background region of image 430, framework 200 may also minimize the classifiers' scores on the background class, p1bg and p2bg. In other words, framework 200 maximizes the objectness of the region covered by the bounding box 440.
Therefore, framework 200 aims to find the optimal bounding box b* that minimizes the following objective loss function:(b)=
(p1,p2)+λ(p1bg+p2bg), (2)
where λ controls the balance of the two terms and, in some embodiments, is set to 0.1 in the experiments.
For faster speed, in some embodiments, framework 200 estimates bounding box b* by performing a single step of gradient descent to update bounding box b:
where α is the step size.
Since the techniques of RoI-Pooling or RoI-Align perform discrete sampling on the feature map to generate ϕ(x, b), loss (b) is not differentiable with respect to bounding box b. Therefore, in some embodiments, the framework 200 adopts the Precise RoI-Pooling method (as described in further detail in Jiang et al., “Acquisition of localization confidence for accurate object detection,” In ECCV, pages 816-832 (2018), incorporated by reference herein), which avoids any quantization of coordinates and has a continuous gradient on b.
In some embodiments, it is observed that the entropy of the classification heads' predictions over object classes would decrease after updating b to b*. A lower entropy suggests that the classifiers are more confident of their predicted object class, which may verify the assumption that b* contains representative information for one and only one object.
Dual-Head Noise Correction
In some embodiments, the framework 200 simultaneously trains two diverged heads (e.g., detection heads 240a, 240b, each including a respective classification head and bbox regression head) with distinct abilities to filter different types of noise, and uses their ensemble to clean the annotation noise. In some embodiments, the systems and methods distill knowledge from each detection head 240a, 240b to teach the other. That is, in some examples, co-teaching is employed in the dual-head network, where each detection head selects box samples with small classification loss to train the other head. This helps alleviate the confirmation bias problem (i.e., a model confirms its own mistakes) and achieves robustness to noise. In some embodiments, the Region Proposal Network (RPN) 230 is trained on all boxes.
Thus, in a second step or process 324 of method 300, the framework 200 performs label Y noise correction and class-specific bounding box B refinement, utilizing the outputs from dual detection heads 240a, 240b.
Label correction. In some embodiments, given the Region of Interest (RoI) feature ϕ(x, b*), the two classification heads (e.g., which may be part of or incorporated in detection heads 240a, 240b) produce two sets of softmax predictions over object classes, p1* and p2*. Considering the bootstrapping method (as described in more detail in Reed et al., “Training deep neural networks on noisy labels with bootstrapping,” In ICLR (2015), which is incorporated by reference herein), in some embodiments, the framework 200 uses the classifiers' predictions to update the noisy GT label (e.g., label “Dog” in
ÿ=(p1*+p2*+y)/3. (4)
Then a sharpening function may be applied on the soft label to reduce the entropy of the label distribution (e.g., changing label to “Cat” in
where
Class-specific bounding box refinement. In some embodiments, for class-specific bounding box refinement, the framework 200 directly regresses the noisy ground-truth (GT) bounding box to minimize both classifier discrepancy and background scores. The two bbox regression heads (e.g., which may be part of or incorporated in detection heads 240a, 240b) produce two sets of per-class bounding box regression offsets, t1 and t2. Let c* denote the class with the highest score in the soft label, i.e. c*=arg maxcyc*, where c=1, 2, . . . , C. In some embodiments, the bounding box b* is refined by merging the class-specific outputs from both bbox regression heads:
t=(t1c*+t2c*)/2
b**=b*+ρt, (6)
where t1c* and t2c* are the bounding box offsets for class c*, and p controls the magnitude of the refinement. b** is the refined, class-specific bounding box. b** serves as a new ground truth, and may be compared against random sampling of potential bounding boxes, e.g., Sampling A and Sampling B, for training of the model Θ. In some examples, Sampling A and Sampling B are not the same.
Model Training
After performing noise correction (with fixed model parameters) for a given mini-batch, the framework 200 next trains or updates the model (with corrected annotations) at process 330. Let Y* and B** denote a mini-batch of soft labels and refined bounding boxes, respectively. In some embodiments, these are used as the new ground-truths (GT) to train the model Θ. Specifically, referring to
In some embodiments, the model Θ is trained using Stochastic Gradient Descent (SGD) with a learning rate of 0.02, a momentum of 0.9, and a weight decay of 1e-4. The hyper-parameters are set as λ=0:1, T=0:4, ρ=0:5, and α∈{0, 100, 200}, which are determined by the validation performance on 10% of training data with clean annotations (only used for validation).
After updating the model Θ for the current mini-batch, the method 300 returns to process 320 for noise correction on the next mini-batch, and then process 330 for further model training/updating. Processes 320 and 330 are repeated for the remaining mini-batches until the last mini-batch (e.g., Maxlters) has been processed. Thereafter, the model Θ may be applied for detecting and labeling objects in various images, for example, by generating respective bounding boxes and labeling each object appropriately.
Experiments on the systems and methods employing or implementing the framework or model for two-step noise correction were conducted. In some examples, for the experiments, the training data of two popular benchmark datasets, PASCAL VOC 2007 and MS-COCO were corrupted with both label noise and bounding box noise corrupt. Results on these systems and methods employing or implementing the framework or model of the present disclosure may be compared against other methods or approaches for learning with noisy annotations. In some examples, these other approaches include Vanilla, Co-teaching, SD-LocNet, and Note-RCNN.
Table 700 of
An ablation study may be conducted to dissect the framework and method of the present disclosure and provide qualitative results. Table 800 of
This description and the accompanying drawings that illustrate inventive aspects, embodiments, implementations, or applications should not be taken as limiting. Various mechanical, compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of this description and the claims. In some instances, well-known circuits, structures, or techniques have not been shown or described in detail in order not to obscure the embodiments of this disclosure. Like numbers in two or more figures represent the same or similar elements.
In this description, specific details are set forth describing some embodiments consistent with the present disclosure. Numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are meant to be illustrative but not limiting. One skilled in the art may realize other elements that, although not specifically described here, are within the scope and the spirit of this disclosure. In addition, to avoid unnecessary repetition, one or more features shown and described in association with one embodiment may be incorporated into other embodiments unless specifically described otherwise or if the one or more features would make an embodiment non-functional.
Although illustrative embodiments have been shown and described, a wide range of modification, change and substitution is contemplated in the foregoing disclosure and in some instances, some features of the embodiments may be employed without a corresponding use of other features. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Thus, the scope of the invention should be limited only by the following claims, and it is appropriate that the claims be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.
This application claims priority to U.S. Provisional Patent Application No. 62/936,138, filed Nov. 15, 2019, which is incorporated by reference herein in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5185848 | Aritsuka | Feb 1993 | A |
10282663 | Socher et al. | May 2019 | B2 |
10346721 | Albright et al. | Jul 2019 | B2 |
10474709 | Paulus | Nov 2019 | B2 |
10521465 | Paulus | Dec 2019 | B2 |
10542270 | Zhou et al. | Jan 2020 | B2 |
10552944 | Wang | Feb 2020 | B2 |
10635943 | Lebel | Apr 2020 | B1 |
20160019459 | Audhkhasi | Jan 2016 | A1 |
20160350653 | Socher et al. | Dec 2016 | A1 |
20170024645 | Socher et al. | Jan 2017 | A1 |
20170032280 | Socher | Feb 2017 | A1 |
20170140240 | Socher | May 2017 | A1 |
20180082171 | Merity et al. | Mar 2018 | A1 |
20180096219 | Socher | Apr 2018 | A1 |
20180121787 | Hashimoto et al. | May 2018 | A1 |
20180121788 | Hashimoto et al. | May 2018 | A1 |
20180121799 | Hashimoto et al. | May 2018 | A1 |
20180129931 | Bradbury et al. | May 2018 | A1 |
20180129937 | Bradbury et al. | May 2018 | A1 |
20180129938 | Xiong et al. | May 2018 | A1 |
20180143966 | Lu et al. | May 2018 | A1 |
20180144208 | Lu et al. | May 2018 | A1 |
20180144248 | Lu et al. | May 2018 | A1 |
20180268287 | Johansen et al. | Sep 2018 | A1 |
20180268298 | Johansen et al. | Sep 2018 | A1 |
20180300317 | Bradbury | Oct 2018 | A1 |
20180336198 | Zhong et al. | Nov 2018 | A1 |
20180336453 | Merity et al. | Nov 2018 | A1 |
20180349359 | Mccann et al. | Dec 2018 | A1 |
20180373682 | Mccann et al. | Dec 2018 | A1 |
20180373987 | Zhang et al. | Dec 2018 | A1 |
20190130206 | Trott et al. | May 2019 | A1 |
20190130248 | Zhong et al. | May 2019 | A1 |
20190130249 | Bradbury et al. | May 2019 | A1 |
20190130273 | Keskar et al. | May 2019 | A1 |
20190130312 | Xiong et al. | May 2019 | A1 |
20190130896 | Zhou et al. | May 2019 | A1 |
20190130897 | Zhou et al. | May 2019 | A1 |
20190188568 | Keskar et al. | Jun 2019 | A1 |
20190213482 | Socher et al. | Jul 2019 | A1 |
20190251168 | McCann et al. | Aug 2019 | A1 |
20190251431 | Keskar et al. | Aug 2019 | A1 |
20190258714 | Zhong et al. | Aug 2019 | A1 |
20190258939 | Min et al. | Aug 2019 | A1 |
20190286073 | Hosseini-Asl et al. | Sep 2019 | A1 |
20190295530 | Hosseini-Asl et al. | Sep 2019 | A1 |
20190362020 | Paulus et al. | Nov 2019 | A1 |
20200005765 | Zhou et al. | Jan 2020 | A1 |
Entry |
---|
Arazo et al., “Unsupervised Label Noise Modeling and Loss Correction,” In ICML, arXiv:1904.11238v2, pp. 312-321, 2019. |
Arpit et al., “A Closer Look at Memorization in Deep Networks,” Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017, pp. 233-242. |
Berthelot et al., “Mixmatch: A Holistic Approach to Semi-Supervised Learning,” 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, pp. 1-11. |
Blum et al., “Combining Labeled and Unlabeled Data with Co-training,” In the Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92-100, 1998. |
Chen et al., “MMDetection: Open MMlab Detection Toolbox and Benchmark,” arXiv:1906.07155, pp. 1-13, 2019. |
Chen et al., “Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels,” Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, PMLR 97, pp. 1062-1070, 2019. |
Cinbis et al., “Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning,” TPAMI, 39 (1):189-203, 2017. |
Deng et al., “Imagenet: A Large-Scale Hierarchical Image Database,” In CVPR, pp. 248-255, 2009. |
Deselaers et al., “Localizing Objects while Learning their Appearance,” In ECCV, pp. 452-466, 2010. |
Dietterich et al., “Solving the Multiple Instance Problem with Axis-Parallel Rectangles,” Artif Intell., 89(1-2):31-71, 1997. |
Everingham et al., “The Pascal Visual Object Classes (V0C) Challenge,” IJCV, 88(2):303-338, 2010. |
Gao et al., “NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection,” In ICCV, pp. 9508-9517, 2019. |
Girshick, “Fast R-CNN,” In Proceedings, of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015. |
Grandvalet et al., “Semi-Supervised Learning by Entropy Minimization,” In Advances in Neural Information Processing Systems, pp. 529-536, 2005. |
Han et al., “Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels,” In NeurIPS, pp. 8536-8546, 2018. |
He et al., “Mask R-CNN” In ICCV, pp. 2961-2969, 2017. |
He et al., “Deep Residual Learning for Image Recognition,” In CVPR, pp. 770-778, 2016. |
Jiang et al., “Acquisition of Localization Confidence for Accurate Object Detection,” In ECCV, pp. 816-832, 2018. |
Jiang et al., “Mentornet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels,” In ICML, pp. 2309-2318, 2018. |
Lee et al., “Cleannet: Transfer Learning for Scalable Image Classifier Training with Label Noise,” In CVPR, pp. 5447-5456, 2018. |
Lin et al., “Feature Pyramid Networks for Object Detection,” In CVPR, pp. 936-944, 2017. |
Lin et al., “Microsoft COCO: Common Objects in Context,” In ECCV, pp. 740-755, 2014. |
Ma et al., “Dimensionality-Driven Learning with Noisy Labels,” In ICML, pp. 3361-3370, 2018. |
Pereyra et al., “Regularizing Neural Networks by Penalizing Confident Output Distributions,” In ICLR Workshop, 2017. |
Reed et al, “Training Deep Neural Networks on Noisy Labels with Bootstrapping,” In ICLR, 2015. |
Ren et al., “Learning to Reweight Examples for Robust Deep Learning,” In ICML, pp. 4331-4340, 2018. |
Ren et al., “Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks,” In NIPS, pp. 91-99, 2015. |
Tanaka et al., “Joint Optimization Framework for Learning with Noisy Labels,” In CVPR, pp. 5552-5560, 2018. |
Tarvainen et al., “Mean Teachers are Better Role Models: Weight-averaged Consistency Targets Improve Semi-supervised Deep Learning Results,” In NIPS, pp. 1195-1204, 2017. |
Vahdat, “Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks,” In NIPS, pp. 5601-5610, 2017. |
Veit et al., “Learning from Noisy Large-scale Datasets with Minimal Supervision,” In CVPR, pp. 6575-6583, 2017. |
Xiao et al., “Learning from Massive Noisy Labeled Data for Image Classification,” In CVPR, pp. 2691-2699, 2015. |
Yi et al., “Probabilistic End-to-End Noise Correction for Learning with Noisy Labels,” In CVPR, 2019. |
Yu et al., “How Does Disagreement Help Generalization against Label Corruption?” In ICML, pp. 7164-7173, 2019. |
Zhang et al., “Understanding Deep Learning Requires Rethinking Generalization,” In ICLR, 2017. |
Zhang et al., “Learning to Localize Objects with Noisy Labeled Instances,” In AAAI, pp. 9219-9226, 2019. |
Number | Date | Country | |
---|---|---|---|
20210150283 A1 | May 2021 | US |
Number | Date | Country | |
---|---|---|---|
62936138 | Nov 2019 | US |