The present disclosure belongs to the interdisciplinary field of medicine and computer science, and specifically relates to a liver CT image segmentation system and algorithm based on mixed supervised learning.
CT scanning is a routine medical examination method that uses X-rays and detectors to scan the cross section of the human body. With the help of a computer, the detection information is reconstructed into a cross-sectional view, that is, a slice. Multiple slices can be combined to form a three-dimension view of the internal organs and tissues of the human body. Pixel-level segmentation of the liver in abdominal CT images is helpful for operations including pathological diagnosis, preoperative planning, and postoperative evaluation of liver diseases, and is of great significance for the treatment and research of liver diseases such as hepatitis, cirrhosis, and liver cancers. Pixel-level segmentation of the liver requires pixel-level annotation performed on the liver part in each layer of CT slices, which is generally completed by doctors in preoperative planning. However, in order to obtain images of tinier tissue lesions, high-resolution CT scanning is usually performed. Therefore, the pixel resolution in CT slices is high, and the slice thickness is small, resulting in an increase in the number of slices and thus increasing the workload of doctors.
In order to enable doctors to achieve fast and accurate liver segmentation, the existing methods mostly use a convolutional neural network structure, and use a large number of CT images with pixel-level annotations as supervision for network training, so that the network can achieve segmentation tasks. This kind of learning methods is called supervised learning, where the pixel-level annotations of CT images are called strong labels because of carrying truth value masks for segmentation tasks. On the contrary, image-level annotations, which refer to whether the CT slices contain the target object, are called weak labels.
The problem in the above supervised learning segmentation method is that pixel-level image annotation requires a significant amount of time and manpower costs during the production of a training dataset, as well as quality control by experienced doctors, making it difficult to obtain strong labels. In order to avoid the dependence of supervised learning on strong labels, the traditional solution is data augmentation, which involves cutting, rotating, flipping, and other operations on the original image to expand the training dataset to improve the generalization ability of the neural network. However, the improvement of this method is limited and wastes a large amount of effective data except for strong label data. Therefore, in recent years, many studies have proposed semi-supervised learning methods, which use generative adversarial learning, knowledge distillation, and other methods to generate pseudo labels for images without annotations, to expand the training dataset. However, these methods cannot avoid the impact of incorrect labels on network learning and are difficult to further improve image segmentation precision. In this regard, some studies have proposed mixed supervised learning, which can enhance the network's generalization ability by using weak labels with annotation costs much lower than strong labels. Weak labels do not require doctors to accurately determine and annotate the positions of lesions, thus greatly reducing manual annotation costs and achieving large-scale acquisition. Moreover, due to the accurate image category information of weak labels compared to no labels, they can better provide effective supervision, thus achieving a balance between segmentation precision requirements and annotation costs. However, the existing mixed supervision methods typically use a multi-task framework. Due to the large number of independent parameters between multiple tasks, inconsistent results may occur. Classification tasks to some extent affect the feature representation of network models, which further affects segmentation precision.
The present disclosure provides a liver CT image segmentation system and algorithm based on mixed supervised learning to improve the accuracy of liver disease examination, in response to the advantages, disadvantages, and existing problems of the above algorithms. The present disclosure is implemented by adopting the following technical solution:
The present disclosure provides a liver CT image segmentation system based on mixed supervised learning, the image segmentation system including an image preprocessing unit, a feature extraction unit, a word vector segmentation unit and a single-layer convolutional classification unit, the image preprocessing unit being in data connection with the feature extraction unit, and the feature extraction unit being respectively in data connection with the word vector segmentation unit and the single-layer convolutional classification unit.
As a further improvement, in the present disclosure, a construction process of the image preprocessing unit includes:
truncating the range of HU values in a CT slice image into [H1, H2], where H1 and H2 respectively represent a lower limit and an upper limit of rough HU values capable of preserving a liver tissue intact and removing a bone structure, and then scaling the size of the slice image to (H0, W0), where (H0, W0) represents the size of an input image of the feature extraction unit.
As a further improvement, in the present disclosure, a construction process of the feature extraction unit includes:
using U-Net as a basic framework, for an output feature map of a second-to-last convolutional layer of U-Net, the size satisfying (B, C0, H, W), where B represents the number of images in each batch, C represents the number of channels, and (H, W) represents the resolution of the feature map, passing the feature map through a convolutional layer with an input channel of C0, an output channel of C and a kernel size of (K0, K0), then performing batch normalization on an output result, and finally passing through a Tan h activation function to obtain a target feature map with a size of (B, C, H, W).
As a further improvement, in the present disclosure, a construction process of the word vector segmentation unit includes:
introducing a learnable word vector v with dimensions of C, the word vector v convolving with the above target feature map to obtain a heat map with a size of (B, 1, H, W), calculating the confidence of each pixel of the heat map by using a sigmoid activation function to obtain the segmentation confidence of each pixel, and dividing foregrounds and backgrounds by using τs as a threshold, pixels with the segmentation confidence less than the threshold being considered as background regions, and pixels with the segmentation confidence greater than or equal to the threshold being considered as foreground regions, that is, liver regions.
As a further improvement, in the present disclosure, a construction process of the single-layer convolutional classification unit includes:
The present disclosure further discloses a segmentation algorithm for the liver CT image segmentation system based on mixed supervised learning, segmentation in the algorithm including a testing stage and a training stage, the testing stage including:
As a further improvement, in the present disclosure, in the training stage, a CT scanning dataset is constructed, and a construction process of the dataset includes: acquiring abdominal CT scanning data, splitting CT into a series of two-dimension slice images along the human body axis, performing manual classification on all slice images based on whether the slice images contain a liver to obtain weak labels, images with the liver being classified into a foreground, and images without the liver being classified into a background, and randomly selecting from all images classified into the foreground a part of images in a number much less than the total number of the images classified into the foreground to perform pixel-level annotation to obtain strong labels, the number of the strong labels and the number of the foreground weak labels being respectively expressed as s and w.
As a further improvement, in the present disclosure, several images are randomly sampled from the images with the strong labels and are sequentially passed through the image preprocessing unit, the feature extraction unit and the word vector segmentation unit, and a binary cross entropy loss Losss is calculated as follows for each corresponding pixel of the output confidence images and the strong labels after the pixel-level annotation:
As a further improvement, in the present application, several images are randomly sampled from the foreground images with the weak labels and are sequentially passed through the image preprocessing unit, the feature extraction unit and the single-layer convolutional classification unit to obtain the confidence of the images, and a classification loss Losscf of the foreground is calculated as follows in combination with image-level class labels:
Compared with the prior art, the present disclosure has the following beneficial effects:
A multi-task framework is used for respectively performing segmentation and classification tasks, to achieve high segmentation precision through a large number of weak label data and a small number of strong labels. For the problem that the independent parameter quantity among multiple tasks is large, network parameters shared among the multiple tasks are increased as many as possible, a learnable word vector v is used for achieving the segmentation tasks, and a 1×1 kernel is used for performing single-layer convolution to implement classification tasks, to reduce the number of independent parameters and improve the result consistency among multiple tasks.
For the problem of interference caused by classification tasks on the feature representation of network models, the present disclosure introduces the learnable word vector v to convolve the feature layer, and introduces the triplet loss in metric learning to explicitly learn the feature representation, making pixel-level features of the same class consistent.
The segmentation performance of mixed supervised learning is significantly better than that of supervised learning using a small number of strong labels. The present disclosure is significantly superior to existing methods in classification and segmentation indicators, can maintain high performance for both classification and segmentation tasks, and can achieve good consistency.
The present disclosure provides a liver CT image segmentation system based on mixed supervised learning. The image segmentation system includes an image preprocessing unit, a feature extraction unit, a word vector segmentation unit and a single-layer convolutional classification unit. The image preprocessing unit is in data connection with the feature extraction unit. The feature extraction unit is respectively in data connection with the word vector segmentation unit and the single-layer convolutional classification unit.
A construction process of the image preprocessing unit includes:
A construction process of the feature extraction unit includes:
A construction process of the word vector segmentation unit includes:
A construction process of the single-layer convolutional classification unit includes:
The present disclosure further discloses a liver CT image segmentation algorithm based on mixed supervised learning. In a testing stage of the algorithm, the following steps are executed.
In a training stage of the algorithm, the following steps are executed. A CT scanning dataset is constructed, and abdominal CT scanning data are acquired. CT is split into a series of two-dimension slice images along the human body axis. Manual classification is performed on all slice images based on whether the slice images contain a liver to obtain weak labels. Images with the liver are classified into a foreground. Images without the liver are classified into a background. A part of images in a number much less than the total number of the images classified into the foreground are randomly selected from all images classified into the foreground to perform pixel-level annotation to obtain strong labels.
The number of the strong labels and the number of the foreground weak labels are respectively expressed as s and w.
Several images are randomly sampled from the foreground images with the weak labels and are sequentially passed through the image preprocessing unit, the feature extraction unit and the single-layer convolutional classification unit to obtain the confidence of the images. A classification loss Losscf of the foreground is calculated as follows in combination with image-level class labels:
The present disclosure was experimented on a publicly available LiTS abdominal CT dataset. The ratio of strong labels to foreground weak labels was 1:49. Firstly, ablation experiments were conducted on whether to introduce a triplet loss. The results are as follows:
Accuracy, precision, recall, F1, and AUC are commonly used indicators for classification tasks, while Dice and IOU are commonly used indicators for segmentation tasks. According to the results in Table 1, introducing the triplet loss can significantly improve the performance of the network in classification and segmentation tasks. In addition, the comparison results between the present disclosure and the existing mixed supervision method are as follows:
U-Net represents supervised learning using strong labels based on a U-Net framework, U-Net-m represents a multi-task framework that uses the U-Net to complete segmentation tasks and complete classification tasks by passing intermediate layer results through a fully connected network. Pawel-19 represents the multi-task network structure in Pawel et al.'s “Deep learning with mixed supervision for brain tuber segmentation”. As shown by the results, the segmentation performance of mixed supervised learning is significantly better than that of supervised learning using a small number of strong labels. The present disclosure is significantly superior to existing methods in classification and segmentation indicators, can maintain high performance for both classification and segmentation tasks, and can achieve good consistency.
The above implementations are only exemplary implementations of the present disclosure, which are not intended to limit the scope of protection of the present disclosure. Any non-substantial changes or replacements made by those skilled in the art on the basis of the present disclosure still belong to the scope of protection of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202111180680.6 | Oct 2021 | CN | national |
The present application is a Continuation Application of PCT Application No. PCT/CN2022/101697 filed on Jun. 27, 2022, which claims the benefit of Chinese Patent Application No. 202111180680.6 filed on Oct. 11, 2021. All the above are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/101697 | Jun 2022 | WO |
Child | 18608929 | US |