Devices that search and identify objects need to process large amounts of data using a computing device. If the images are processed at lower resolution then the precision and recall accuracies may degrade. If very high resolution images are used, the amount of data processing may explode. This invention presents a technique to build a device that will perform image processing for object search and identification in a multi-resolution manner, to control the computational cost, while still maintaining high accuracies. In the field of deep Neural Networks, models continue to become deeper the corresponding datasets for training the models become larger.Particularly, large quantities of high-resolution images paired with the millions of parameters in state-of-the-art computer vision models, totals to an exceptionally long training process. If the image is very high resolution, this would also limit the batch size and therefore the quality of training. A simple solution is to lower the resolution of the images, but this sacrifices minute, yet potentially important, details in images. Another, equally sacrificial solution, is to crop the image, which preserves the high resolution but forgoes entire discarded regions of the image. This invention presents a combination of these two techniques, that uses multiple models and Gradient Class Activation Maps (GradCAM) to crop the images in a way that retains the most information. This solution improves the accuracy of classification while maintaining a reasonable training time.
In the preferred embodiment of this invention, an image processing library such as OpenCV is used to subsample the images to a lower resolution such as 331×331. The subsampled images (106) are stored separately on the hard disk of the computer. The low resolution labels and the corresponding classification labels are used to train a convolutional neural network based classifier such as NasNet or Xception [1, 7]. Other architectures are also a possibility.
Gradient Class Activation Map (GradCAM) is a known technique used to explain the decision of a convolutional neural network [4]. It works on the output of the final and pre-final layers of a convolutional neural network, to create a heatmap indicating the region of interest that is responsible for the classification decision. The implementation of GradCAM in this invention contains optimizations in the code that cause it to run very fast, even on large datasets.
GradCam is applied to the classification output of the low resolution image to obtain a heat-map [4]. A centroid of that heatmap is calculated, and then the high resolution image is cropped in such a way that the centroid of the cropped image coincides with the centroid of the heat-map as shown in block 102 of
The resulting high resolution but cropped image is then used as a new input to train a new neural network for the final classification of the image. Our implementation on GradCAM is novel and performs at a higher speed than many other implementations. An example code describing the implementation is shown below.
The Initialization portion is run only once in the beginning. GradCam approach (Depicted in
This approach allows us to train on millions of images without consuming exorbitant amounts of time and compute costs. It is often the case that GradCAM heatmaps are completely black, and no clear centroid exists. Therefore we add a small heat bias in the center of the image in the heatmap. This ensures that a centroid is found.
To test the usability and performance of this invention, CT scans of the brain are used, some normal, and others showing from five different types of intracranial brain hemorrhage, to train a deep CNN architecture to classify intracranial hemorrhage . The accuracy results were calculated for both single stage and two-stage multi-resolution approach.
References
[1] Chollet, F. (2017). Xception: deep learning with depthwise separable convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1800-1807.
[2] Hssayeni, M. D., Croock, M. S., Al-Ani, A., Al-khafaji, H. F., Yahya, Z. A., Ghoraani, B. (2019). Intracranial hemorrhage segmentation using a deep convolutional model. doi:10.13026/w8q8-ky94
[3] Kuoa, W., Hanea, C., Mukherjeeb, P., Malika, J., Yuhb, E. L. (2019). Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. PNAS, 166(45), 22737-22745. doi:10.1073/pnas.1908021116
[4] Selvaraju, R. R., Das A., Vedantam R., Cogswell M., Parikh D., Batra D. (2016). Grad-CAM: why did you say that? Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 336-359
[5] Shen, J., Zhang, C., Jiang, B., Chen, J., Song, J., Liu, Z., . . . Ming, W. K. (2019). Artificial intelligence versus clinicians in disease diagnosis: Systematic review. JMIR medical informatics, 7(3). doi:10.2196/10010
[6] Ye, H., Gao, F., Yin, Y., Guo, D., Zhao, P., Lu, Y., . . . Xia, J. (2019). Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. European Radiology, 29, 6191-6201. doi:10.1007/s00330-019-06163-2
[7] Zoph, B., & Vasudevan, V., Shlens, J., Le, Quoc. (2018). Learning transferable architectures for scalable image recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8697-8710.. doi:10.1109/CVPR.2018.00907.