The present invention relates to fast and optimal machine learning and more particularly to a fast and optimal machine learning method for deep semantic segmentation.
Machine learning, especially deep learning, powered by the tremendous computational advancement (GPUs) and the availability of big data has gained significant attention and is being applied to many new fields and applications. It can support end-to-end learning and learn hierarchical feature representation automatically. It is highly scalable and can achieve better prediction accuracy with more data. To handle large variations and dynamics inherent in sequential data, high capacity model is often required. It could be extremely effective when trained with high capacity model (>108 parameters).
However, high capacity model requires the training of huge labeled (annotated) datasets to avoid over-fitting. For example, the image database ImageNet contains 1.2 million images with 1000 categories for deep network training. In this highly connected mobile and cloud computing era big datasets are becoming readily available. Therefore, the bottleneck is in acquiring the labels rather than the data. The situation is exacerbated with semantic segmentation applications as all pixels in the ground truth image are assigned with the categories of their classes, such as vehicles, buildings, streets, pedestrians, trees, signs, signals, and sky etc. Visual/manual annotation is tedious, cumbersome and expensive, and usually needs to be created by domain experts.
In the prior art deep model learning for semantic segmentation, as shown in
The primary objective of this invention is to provide an efficient method for computerized fast deep semantic segmentation using partial ground truth. The secondary objective of the invention is to provide a computerized active learning method for deep semantic segmentation. The third objective of the invention is to provide a computerized optimal learning method that weights learning by confidence. The fourth objective of this invention is to provide a computerized optimal transfer learning method for deep semantic segmentation.
In the present invention, deep model is learned with partially annotated ground truth data in which only some regions (not entire image) contain class labels. The regions without labels in the ground truth data are treated as “do-not-care”. The fast deep semantic segmentation learning minimizes a do-not-care robust loss function at output layer while iteratively adjusting parameters at each layer of the deep model.
The concepts and the preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.
I. Fast Machine Learning Method for Deep Semantic Segmentation
In one embodiment of the invention, the deep model is an encoder-decoder network. The encoder takes an input image and generates a high-dimensional feature vector with aggregated features at multiple levels. The decoder decodes features aggregated by encoder at multiple levels and generates a semantic segmentation mask. Typical encoder-decoder networks include U-Net and its variations such as U-Net+Residual blocks, U-Net+Dense blocks, 3D-UNet. The model can be extended to recurrent neural networks for applications such as language translation, speech recognition, etc.
In one embodiment of the invention, the fast machine learning method for deep semantic segmentation is through an iterative process that gradually minimizes the loss function at output layer by adjusting weights/parameters (Θ*) at each layer of the model using a back propagation method. The loss function is usually the sum of the differences between ground truth data L(x) and the model output p(I(x), Θ) for all points of the image I(x). The fast machine learning method of the invention calculates the loss function at output layer by summing up the differences between ground truth and the model output only for points with labels, and ignoring any points within the “do-not-care” regions in the partial ground truth data L(x). A do-not-care robust loss function can be described by the formula below:
θ*=argminθ∫label≠do-not-careLoss(p(I(x),θ),L(x))dx
The back propagation method is then applied to adjust weights/parameters (Θ*) at each layer of the model once the loss function is calculated and minimized at the last layer.
II. Active Machine Learning Method for Deep Semantic Segmentation
The output of applying a trained deep model to an input image is a probability map that represents the per-pixel prediction by the deep learning model. It is also possible to generate a confidence map as the second output that represents the confidence on the per-pixel prediction by the deep model.
In
III. Optimal Machine Learning Method for Deep Semantic Segmentation
IV. Optimal Transfer Learning Method for Deep Semantic Segmentation
The invention has been described herein in considerable detail in order to comply with the Patent Statutes and to provide those skilled in the art with the information needed to apply the novel principles and to construct and use such specialized components as are required. The embodiments described herein are exemplary and may simplify or omit elements or steps well-known to those skilled in the art to prevent obscuring the present invention. However, it is to be understood that the inventions can be carried out by specifically different equipment and devices, and that various modifications, both as to the equipment details and operating procedures, can be accomplished without departing from the scope of the invention itself.
This work was supported by U.S. Government grant number 4R44NS097094-02, awarded by the NATIONAL INSTITUTE OF NEUROLOGICAL DISORDERS AND STROKE. The U.S. Government may have certain rights in the invention.
Number | Name | Date | Kind |
---|---|---|---|
20080312906 | Balchandran | Dec 2008 | A1 |
Entry |
---|
Ronneberger et al., “U-Net: Convolutional Networks for Biomedical Image Segmentation”, arXiv:1505.04597v1 [cs.CV] , May 18, 2015. |
He et al., “Identity Mappings in Deep Residual Networks”, arXiv:1603.05027v3 [cs.CV] Jul. 25, 2016. |
Garcia-Garcia, “A Review on Deep Learning Techniques Applied to Semantic Segmentation”, arXiv:1704.06857v1 [cs.CV] 22 Apr. 22, 2017. |
Huang et al., “Densely Connected Convolutional Networks”, arXiv:1608.06993v5 [cs.CV] Jan. 28, 2018. |
Number | Date | Country | |
---|---|---|---|
20200242414 A1 | Jul 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16010593 | Jun 2018 | US |
Child | 16851119 | US |