Segmentation of images is a tool used in a variety of fields. For example, there is an extensive literature on segmentation of cancer lesions in both X-ray and ultrasound images. Recently, a new generation of ultrasound technology has been developed that takes advantage of the optoacoustic effect, i.e., optoacoustic imaging. In one form, it uses laser illumination to highlight the presence of hemoglobin molecules in the tissue. The hemoglobin response can be used to determine the oxygenation of the tissue surrounding tumors. Cancerous tumors tend to disrupt local vascular networks whereas benign tumors do not. This results in cancerous tumors having deoxygenated peripheries. There has been work on trying to produce segmented images using the optoacoustic features primarily using UNet networks. Prior art has used U-nets on single channel optoacoustic images (OA), or naively combined multiple OA channels at the input.
Further, there have been many applications of deep networks to optoacoustic data. Deep neural networks have been used to remove artifacts from optoacoustic images. Neda Davoudi, Xosé Luis Deán-Ben1,3 and Daniel Razansky, Deep learning optoacoustic tomography with sparse data. Nature Machine Intelligence. Vol 1, October 2019, pp 453-460. Deep neural networks have been used to classify level of oxygenation in tissue. C. Yang and F. Gao, “Eda-net: Dense aggregation of deep and shallow information achieves quantitative photoacoustic blood oxygenation imaging deep in human breast,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 246-254, Springer (2019), C. Yang, H. Lan, H. Zhong, et al., “Quantitative photoacoustic blood oxygenation imaging using deep residual and recurrent neural network,” in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), 741-744, IEEE (2019), G. P. Luke, K. Hoffer-Hawlik, A. C. Van Namen,et al., “O-Net: A Convolutional Neural Network for Quantitative Photoacoustic Image Segmentation and Oximetry,”arXiv preprintarXiv:1911.01935 (2019), K. Hoffer-Hawlik and G. P. Luke, “abso2luteu-net: Tissue oxygenation calculation using photo acoustic imaging and convolutional neural networks,” (2019)]. Tissue classification has been done by traditional feature design with SVM and random forest classifiers. S. Moustakidis, M. Omar, J. Aguirre, et al., “Fully automated identification of skin morphology in raster-scan optoacoustic mesoscopy using artificial intelligence,” Medical Physics 46(9), 4046-4056 (2019). Publisher: Wiley Online Library. Lafci applies a generic UNet architecture to this problem. Berkan Lafci, Elena Mercep, Stefan Morscher, Xosé Luis. Deep Learning for Automatic Segmentation of Hybrid Optoacoustic Ultrasound (OPUS) Images, IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL, VOL. 68, NO. 3, MARCH 2021, B. Lafci, E. Mercep, S. Morscher, et al., “Efficient segmentation of multi-modal optoacoustic and ultrasound images using convolutional neural networks,” in Photons Plus Ultrasound: Imaging and Sensing 2020, 11240, 112402N, International Society for Optics and Photonics (2020). Luke uses two U-Nets to segment blood vessels and the second estimates SO2 concentration. G. P. Luke, K. Hoffer-Hawlik, A. C. Van Namen, et al., “O-Net: A Convolutional Neural Network for Quantitative Photoacoustic Image Segmentation and Oximetry,”arXiv preprintarXiv:1911.01935 (2019). Grohl employs UNet and fully connected networks (FCNN). Janek Gröhl, Melanie Schellenberg, Kris Dreher, Niklas Holzwarth, Minu D. Tizabi, Alexander Seitel, and Lena Maier-Hein, Semantic segmentation of multispectral photoacoustic images using deep learning, https://arxiv.org/pdf/2105.09624.pdf. Jnawali explores use of 3D convolution to get blood volume concentration in samples of thyroid. K. Jnawali, B. Chinni, V. Dogra, et al., “Deep 3D convolutional neural network for automatic cancer tissue detection using multispectral photoacoustic imaging,” in Medical Imaging 2019:Ultrasonic Imaging and Tomography, 10955, 109551D, International Society for Optics and Photonics (2019), K. Jnawali, B. Chinni, V. Dogra, et al., “Transfer learning for automatic cancer tissue detection using multispectral photoacoustic imaging,” in Medical Imaging 2019: Computer-AidedDiagnosis, 10950, 109503W, International Society for Optics and Photonics (2019), K. Jnawali, B. Chinni, V. Dogra, et al., “Automatic cancer tissue detection using multispectral photoacoustic imaging,” International Journal of Computer Assisted Radiology and Surgery15(2), 309-320 (2020). Publisher: Springer. Chlis weights opto-acoustic channel features at the input and then passes them through a UNET. Nikolaos-Kosmas Chlis, Angelos Karlas, Nikolina-Alexia Fasoul, Michael Kallmayer, Hans-Henning Eckstein, Fabian J. Theis, Vasilis Ntziachristos, CarstenMarr, A sparse deep learning approach for automatic segmentation of human vasculature in multispectral optoacoustic tomography, Photoacoustics, Vol 20, December 2020. Yuan combines a fully connected network and a U-net on a single channel opto-acoustic image. Alan Yilun Yuan, Yang Gao, Liangliang Peng, Lingxiao Zhou, Jun Liu, Siwei Zhu, and Wei Song, Hybrid deep learning network for vascular segmentation in photoacoustic imaging, Biomed Opt Express November 1; 11(11): 6445-6457, 2020.
There is a deficiency in the field of any work combining traditional ultrasound (US) with optoacoustic (OA) features with deep networks. Naïve early combination prevents effective use of pretrained deep networks because the combined US-OA features do not look like natural image features. US features are very structural and OA features are volumetric and diffuse.
According to one aspect of the presently described embodiments, the method for improved semantic segmentation in images comprises receiving image data from multiple channels corresponding to disparate input sources, splitting the image data into data streams, encoding the data streams using separate encoders to obtain encoded data at each stage of the separate encoders, concatenating the encoded data across each stage of the separate encoders, encoding the concatenated data for each stage of the encoders using a single layer of non-linear units to obtain a feature array, and outputting the feature array.
According to another aspect of the presently described embodiments, the disparate input sources comprise ultrasound and opto-acoustic input sources.
According to another aspect of the presently described embodiments, the disparate input sources comprise input sources using different sensors on different bands.
According to another aspect of the presently described embodiments, the method further comprises use of pretrained networks to accelerate learning.
According to another aspect of the presently described embodiments, the method further comprises decoding the feature array.
According to another aspect of the presently described embodiments, the images include images of cancer lesions.
According to another aspect of the presently described embodiments, the images include satellite images.
According to another aspect of the presently described embodiments, a system for improved semantic segmentation in images comprises at least one processor and at least one memory, having stored therein instructions, the memory and instructions being configured such that execution of the instructions by the processor cause the system to receive image data from multiple channels corresponding to disparate input sources, split the image data into data streams, encode the data streams using separate encoders to obtain encoded data at each stage of the separate encoders, concatenate the encoded data across each stage of the separate encoders, encode the concatenated data for each stage of the encoders using a single layer of non-linear units to obtain a feature array, and output the feature array.
According to another aspect of the presently described embodiments, the disparate input sources comprise ultrasound and opto-acoustic input sources.
According to another aspect of the presently described embodiments, the disparate input sources comprise input sources using different sensors on different bands.
According to another aspect of the presently described embodiments, the memory and instructions are further configured such that execution of the instructions by the processor cause the system to use pretrained networks to accelerate learning.
According to another aspect of the presently described embodiments, the memory and instructions are further configured such that execution of the instructions by the processor cause the system to decode the feature array.
According to another aspect of the presently described embodiments, the images include images of cancer lesions.
According to another aspect of the presently described embodiments, the images include satellite images.
According to the presently described embodiments, a technique is provided to use data or features from disparate sources of input data, e.g., different data channels, to analyze image data to, for example, perform image segmentation. Such a method and/or system allows for advantageous and selective exploitation of features of the various channels or sources to improve the process and/or result.
As will be appreciated by those of skill in the art, the presently described embodiments may be implemented in a variety of environments for a variety of different applications. As such,
With reference now to
With reference to
With reference now to
According to various embodiments,
As noted above, in a more specific example of the presently described embodiments,
In one form, the presently described embodiments combine traditional ultrasound (US) and two channels of opto-acoustic response (OA1) and (OA2) corresponding to oxygenated and deoxygenated blood. The technique is related to the UNet segmentation network 400, an example of which is illustrated in
To further explain, the UNet segmentation network 400 does this by first abstracting the image stage by stage from a detailed but shallow representation to a deep but coarse representation that encodes many features describing the content or composition of the image. This is illustrated by the left side of the network 400 and is the encoder which produces features from an image. To accelerate learning, a pretrained network is often used for the encoder—such as Resnet-50 trained on MS-COCO or Imagenet.
The output is produced by a decoder. As shown on the right side of the network 400, the decoder combines the high-level abstract features with some details of the earlier layers to produce an output image such as image/data 490 that is based on abstractions but uses earlier levels to get the localization correct in the final output image. To carry the previous “dog” example forward, the abstract layer encodes that there is a dog-like object in the center of the image and earlier layers are used to obtain sharper boundaries of the dog. The use of multiple layers also provides some natural invariance to scale. Variations of the basic idea of the UNet segmentation network 400 can be created by changing the decoder (e.g., DeeplabV3++). It will be appreciated that changing the decoder may necessitate changes in or to the encoder.
In the cancer segmentation application, it should be appreciated that the presently described embodiments, in at least one form, identify different types of tissue that are imaged with a variety of disparate sources. Different pixel colors are used to correspond to background tissue (not a lesion), benign tissue and malignant cancerous tissue.
According to the presently described embodiments, it is recognized that the optoacoustic and ultrasound features are typically very different in nature. For example, ultrasound features show tissue boundaries well whereas optoacoustic show diffuse patterns of hemoglobin and oxygenation. Also, radiologists in the field have indicated there is more information in the ratios of optoacoustic oxy and deoxy channels than in their absolute values. It, therefore, makes sense to abstract these features before combining them. This is especially true if we want to take advantage of pretraining. This motivates the idea of a multi-stream network such as the Temporal Segment Network used in activity recognition which combines RGB color information with optical flow information. At the same time, it is important to the U-Net to maintain features at multiple scales.
To address the above noted issues for cancer lesions and other applications having disparate sources and/or differing image features, the presently described embodiments implement features and/or techniques that take the place of traditional features in, for example, a U-Net encoder, such as the encoder described in
With reference to
It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.