This invention relates to the segmentation of data, particularly but not exclusively data produced using a multi-energy computed tomography (CT) scanning system.
Multi-energy computed tomography (CT) is an x-ray imaging modality which produces 3D images of the inside of objects. CT scanners use polychromatic x-ray sources which emit a full rainbow or spectrum of x-rays with various ‘colours’ (x-ray energies). In regular CT there is no distinction made between the different energies of x-rays. However, x-rays are absorbed differently by different materials in the body, and absorption also depends on x-ray energy. Multi-energy CT measures the absorption of x-rays in different energy ranges. Using the differences in x-ray absorption in these energy ranges it is possible to discriminate between (identify) and quantify various materials in a scanned subject.
Interactive image segmentation is a process of partitioning an image into disjoint and meaningful regions with the help of user guidance. Image segmentation in general plays an important role in a wide range of medical imaging applications and analysis tasks.
For example, image segmentation may allow the separation of trabecular and cortical bone to evaluate bone health, quantifying nano particles in a particular organ, tumour segmentation, region specific material quantification, etc. However, accurate and automatic segmentation of image data produced by a multi-energy CT system remains a challenging problem. Some segmentation solutions for dual-energy CT image data are known. However, little to no labelled data that is required for designing automatic segmentation solutions is available for the segmentation of data produced using multi-energy CT.
Several pre-clinical studies have shown that multi-energy CT is able to differentiate various types of tissues in the human body, as well as different contrast agents commonly used in CT imaging [1], [2]. However, present interactive segmentation methods are not designed to take advantage of the additional information provided by multi-energy CT images.
In the interactive image segmentation literature, both random walks [3] and graph cuts [4] model the image as a graph and learn a probability model from foreground and background scribbles given by the user. Grabcut [5] improved graph cuts by changing the interaction type from scribbles to a bounding box around the object. Geodesic star convexity [6] employed star convexity constraints on graph cuts to try to connect foreground objects and background objects. Geodesic graph cuts [7] is similar except that it uses the geodesic distance between pixels instead of geodesic star convexity constraints.
A range of methods use superpixels to perform fewer computations. For example, maximal-similarity region merging [8], which is an iterative region merging approach. However, most superpixel driven methods assume the input image to be a single channel or an RGB image, and use only low-level features to segment the image into the foreground and background. Region-based methods [9], [8], [10] use preliminary features such as histograms and mean intensities to represent superpixels. Due to the multi-channel nature of multi-energy CT datasets (e.g. the Applicant's datasets currently range from 4 to 8 channels), following the same strategy leads to an increase in the histogram length by the number of bins per channel times the additional number of channels, which in turn may cause curse of dimensionality problems.
Recently proposed deep learning based methods are shown to be very effective [11], [12]. They use convolutional neural networks to learn the object boundaries and shapes. However, in addition to requiring large training datasets, they suffer from limiting the algorithm to a fixed set of foreground objects.
For the purposes of this specification, the term “multi-energy CT” refers to CT with 3 or more energy ranges or bins. This is distinct from dual-energy CT which uses only two energies.
It is an object of the invention to provide a computer implemented image segmentation method suitable for use with data produced using a multi-energy CT system.
Alternatively, it is an object of the invention to provide a computer implemented image segmentation method incorporating a bag of features method.
Alternatively, it is an object of the invention to at least provide the public with a useful choice.
In one aspect the invention may provide a computer-implemented method for segmentation of multi-energy CT data, the multi-energy CT data including data in three or more energy bands, the method including: receiving in memory the multi-energy CT data; displaying the multi-energy CT data to a user; receiving user input of one or more region indicators for the displayed data; dividing the data into superpixels; labelling at least some of the superpixels based on the received region indicators; constructing feature vectors for at least some of the superpixels; based on the labelled superpixels and feature vectors, classifying the superpixels using a machine learning classifier; and segmenting the data into two or more regions based on the classification of the superpixels.
The machine learning classifier may be a Support Vector Machine.
The segmented data may be displayed to a user. The segmented data may be stored in memory.
Dividing the data may be achieved using Simple Linear Iterative Clustering. The superpixels may be non-overlapping superpixels.
Classifying the superpixels may include classifying global superpixel level descriptors into different classes.
Constructing feature vectors for at least some of the superpixels may be performed using a compact coding process.
Constructing feature vectors for at least some of the superpixels may be performed using a Bag of Features, Bag of Words, Fisher Vectors or Vector of Locally Aggregated Descriptors (VLAD) process.
Constructing feature vectors for at least some of the superpixels may include encoding and pooling the feature vectors. Encoding may include clustering labelled superpixels to create a codebook of visual words. Encoding may be performed using clustering techniques selected from k-means clustering, fuzzy clustering or gaussian mixture models. Pooling may include generating feature vectors using the codebook of visual words and augmented data from each superpixel.
A plurality of feature vectors may be constructed per superpixel.
Construction of feature vectors for a superpixel may include randomly sampling pixels from that superpixel.
One or more post processing steps may be performed, selected from smoothing object boundaries, and correcting misclassifying object regions.
If the segmentation of the image is determined to be unacceptable, the method may include receiving user input of one or more further region indicators for the displayed data; labelling at least some of the superpixels based on the received further region indicators; based on the labelled superpixels and feature vectors, reclassifying the superpixels using the linear Support-Vector Machine; and resegmenting the image into two or more regions based on the reclassification of the superpixels.
The image may be augmented with one or more of: texture information, horizontal gradient information and vertical gradient information.
The texture information may be generated from semi-local pixel information.
The feature vectors may be all of the same length.
In another aspect a computer-implemented method for segmentation of multi-energy CT data, the multi-energy CT data including data in three or more energy bands, may include: receiving the multi-energy CT data in memory; displaying the multi-energy CT data to a user; receiving user input of one or more region indicators for the displayed data; dividing at least some of the multi-energy CT data into superpixels; labelling at least some of the superpixels based on the received region indicators; constructing feature vectors for at least some of the superpixels; and based on the feature vectors and labelled superpixels, segmenting the image into two or more regions.
In a further aspect, a computer-implemented method for segmentation of multi-energy CT data, the multi-energy CT data including a plurality of data elements in three or more energy bands, may include receiving the multi-energy CT data in memory; displaying the multi-energy CT data to a user; dividing the multi-energy CT data into clusters of data elements; receiving user input of one or more region indicators for the displayed data; based on the received region indicators, labelling one or more of the clusters of data elements; constructing feature vectors for at least some clusters of the data elements; based on the feature vectors and labelled clusters of data elements, classifying the clusters of data elements; and segmenting the image into two or more regions based on the classification of the clusters of data elements.
The data elements may be pixels or voxels.
The clusters of data elements may be superpixels or supervoxels.
A multi-energy CT method may include performing a CT scan using a multi-energy CT system using three or more energy bands, to produce multi-energy CT data; and segmenting the multi-energy CT by any of the methods set out above.
In another aspect, a computer-implemented method for segmentation of image data, the image data including a plurality of data elements, may include receiving the image data in memory; displaying the image data to a user; receiving user input of one or more region indicators for the displayed image; based on the received region indicators, labelling one or more of: one or more of the data elements; and one or more clusters of the data elements; using a Bag of Features, Bag of Words, Fisher Vectors or Vector of Locally Aggregated Descriptors process. to construct feature vectors for one or more of: at least some of the data elements; and at least some clusters of the data elements; based on the feature vectors and labelled data elements and/or labelled clusters of data elements, segmenting the image into two or more regions.
A multi-energy CT data segmentation system, may include: memory arranged to store multi-energy CT data including data in three or more energy bands; a display arranged to display the multi-energy CT data to a user; a user input device arranged for user input of one or more region indicators for the displayed data; and a processor arranged to: divide at least some of the multi-energy CT data into superpixels; label at least some of the superpixels based on the received region indicators; construct feature vectors for at least some of the superpixels; and based on the labelled superpixels and feature vectors, classify the superpixels using a linear Support-Vector Machine; and segment the data into two or more regions based on the classification of the superpixels.
A multi-energy CT system, may include: a multi-energy CT scanner configured to scan a subject to produce multi-energy CT data including data in three or more energy bands; memory arranged to store the multi-energy CT data; a display arranged to display the multi-energy CT data to a user; a user input device arranged for user input of one or more region indicators for the displayed data; and a processor arranged to: divide at least some of the multi-energy CT data into superpixels; label at least some of the superpixels based on the received region indicators; construct feature vectors for at least some of the superpixels; and based on the feature vectors and labelled superpixels, segment the image into two or more regions.
In another aspect, a computer-implemented method for segmentation of an image produced using a multi-energy CT system using three or more energy bands, may include:
Superpixel generation may be achieved using Simple Linear Iterative Clustering.
Labelled superpixels may be represented as
{(Si,Li)}i=1N,Li∈{0,1, . . . (M−1)} (1)
where Si denotes the ith labelled superpixel, and Li denotes its corresponding label and M is the maximum number of classes Li takes its value from.
The compact coding process may include encoding and pooling the feature vectors.
One or more post processing steps may be performed, selected from smoothing object boundaries, correcting misclassifying object regions using connected component analysis.
Actions d)-i) may be repeated to obtain improved segmentation accuracy.
The compact coding process may be selected from a Bag of Features, Bag of Words, Fisher Vectors or Vector of Locally Aggregated Descriptors (VLAD).
The method may include creating an augmented image to improve classification accuracy. Creating an augmented image may include augmenting the image with texture and horizontal and vertical gradient information to improve segmentation.
The texture information may be generated from semi-local pixel information.
Encoding may include clustering labelled superpixels to create a codebook of visual words.
Encoding may be performed using clustering techniques selected from k-means clustering, fuzzy clustering or gaussian mixture models.
Pooling may include generating feature vectors using the codebook of visual words and augmented data from each superpixel.
Encoding and pooling may be completed using a Vector of Locally Aggregated Descriptors (VLAD) framework.
The number of feature descriptors/vectors generated per superpixel may be increased to artificially produce more training data for classification at h).
A system for segmentation of an image produced using a multi-energy CT system using three or more energy bands, may include:
A system for segmentation of an image produced using a multi-energy CT system using three or more energy bands, may include:
Further aspects of the invention, which should be considered in all its novel aspects, will become apparent to those skilled in the art upon reading of the following description which provides at least one example of a practical application of the invention.
One or more embodiments of the invention will be described below by way of example only, and without intending to be limiting, with reference to the following drawings, in which:
The invention disclosed herein relates to the interactive segmentation of multi-energy CT data sets. Segmentation may be performed using a bag of features method or other compact coding process. The invention also relates to one or more CT scanning systems, software facilities, computer program products, computer systems or computer implemented methods for the interactive segmentation of multi-energy CT data sets.
The Applicant's segmentation method may be used, for example, for segmenting or isolating a particular region in multi-energy CT data. In CT scans on human or animal subjects, the region to be isolated may be a particular organ, part of an organ, bone, part of a bone, tumour or other anatomical region.
In literature, most existing segmentation methods are limited to performing a specific task or tied to a particular imaging modality. Therefore, when applying generalized methods to datasets produced using multi-energy CT, the additional energy information acquired from the CT scanner cannot be sufficiently utilized.
Described herein is a new approach that circumvents this problem by effectively aggregating the data from multiple channels. The method solves a classification problem to get the solution for segmentation.
Starting with a set of labelled pixels, the data may be partitioned using superpixels.
Then, a set of local descriptors, extracted from each superpixel, may be encoded into a code-book and pooled together to create a global superpixel level feature vector (bag of features representation).
The vector of locally aggregated descriptors may be employed as the encoding/pooling strategy, as it is efficient to compute and leads to good results with simple linear classifiers.
A linear-Support Vector Machine may be used to classify the superpixels into different labels.
The proposed method was evaluated on multiple datasets produced using a multi-energy CT scanner. Experimental results show that the method disclosed herein achieved an average of more than 10% increase in the accuracy over other known methods.
In one embodiment, the method may specifically focus on creating high-level, fixed-length feature vectors for superpixels using a bag of features approach. Generating feature vectors in this way not only aggregates information from all spectral channels, but also gives us the ability to control the feature vector lengths.
Data augmentation strategies are also proposed to increase the number of feature vectors to have better classification accuracy.
Upon evaluating against a set of state-of-the-art methods, the method described herein achieves higher segmentation accuracy compared to other traditional methods.
In the example below, we solve a classification problem to interactively segment spectral CT images into two regions (‘foreground’ and ‘background’) using a bag of features approach.
The method 100 of the proposed segmentation process for a single 2D slice of multi-energy CT data is depicted in
In method 100 (
These labelled superpixels may then be studied to generate feature vectors using a bag of features approach. For better accuracy while computing feature vectors, as an optional step an augmented image may be generated using high level features by adding texture and gradient maps. Then, image segmentation may be performed using a linear Support Vector Machine (SVM) or other suitable machine learning based classifier, with the help of the labelled superpixels. As shown in
Scan data from a multi-energy CT scanner may be stored in memory. An image that is derived from the CT scan data may be loaded 101 and divided into superpixels 102. The image 101 may generally be, or be based on, a 2D slice of multi-energy CT data.
Superpixels are a group of pixels merged together into meaningful sub image regions. Unlike pixels in a rigid pixel grid, a superpixel can be of any size and shape. They capture the redundancy of pixels in sub image regions, and provide a convenient form to extract features. Overall, having fewer data points to process reduces the computational requirements for subsequent processing tasks.
One major advantage of superpixels is their robustness towards noise in the image. Just like any other medical datasets, multi-energy CT datasets often contain noise and at times image artifacts [14]. The fact that superpixels are used for segmentation rather than a single pixel may smooth out some of the inherent noise present in the image.
There are many approaches to generate superpixels. For example, Simple Linear Iterative Clustering (SLIC) [15] may be used. Other suitable methods of generating superpixels may include: derivatives of SLIC (e.g. manifold SLIC (MSLIC)); Gaussian generative models and derivatives thereof; Normalized cuts; Turbopixels; EWVCT; Vcells; BGD (bilateral geodesic distance); LRW (lazy random walk); DBSCAN and other available superpixel generation methods.
These methods may provide varying performance and quality of superpixel generation and therefore of the resulting segmentation.
SLIC is an adaptation of the k-means [16] clustering approach to efficiently generate superpixels. Previous research has shown that SLIC has a better tradeoff between computational complexity, accuracy, and boundary compactness than most state-of-the-art algorithms [17] and [18].
For the implementation of interactive segmentation, the image is displayed to a user. The user inputs one or more region indicators, e.g. a set of strokes 103 via any suitable user input, including e.g. touch screens, pointer devices, computer mouse etc, to indicate the pixels that belong to a particular region or class. Strokes, scribbles or markings may be entered onto an image on a graphical interface by an informed user to indicate foreground and background regions in an image.
As the image is already divided into superpixels, each superpixel may be considered labelled if any one pixel inside it is marked by a stroke. This assumption may be considered acceptable as superpixels tend to contain pixels that are homogeneous in nature. When a superpixel is labelled, all of the pixels contained within it may be automatically labelled as well. Let the number of labelled superpixels be N, then all the labelled superpixels can be represented as
{(Si,Li)}i=1N,Li∈{0,1, . . . (M−1)} (1)
where Si denotes the ith labelled superpixel, and Li denotes its corresponding label. M is the maximum number of classes Li takes its value from (M=2 in this example).
The multi-energy CT image may be augmented 104 with texture and horizontal and vertical gradient information to improve segmentation.
Texture features based on semi-local image information: It is impossible to quantify texture using information from a single pixel, since it is semi-local by nature. Semi-local information consists of a close neighbourhood around the current pixel whereas local information includes just the position and value of the pixel in consideration. This can also be called a patch.
Consider for instance a 2D grayscale image I={Iij:(i,j)∈Ω}. It can be viewed as a function I(x,y):Ω→R.
To denote the semi-local information, a square patch Pxy(I) of size n×n (an odd positive integer) centered at pixel (x, y) may be extracted.
It may be considered that the given input image is obtained by discretizing a differentiable surface that allows the use of some efficient tools from classic differential geometry.
Sochen et al. [24] proposed to represent the image as Remannian manifolds embedded in a higher dimensional space. For example, a 2D gray image I:R2→R, can be viewed as a surface (denoted by Σ) with local coordinates (x, y) embedded in R3 by a mapping Mxy(I)→(x,y,I(x,y)).
Using formula (2) above, this manifold based representation can be extended to support semi-local information at location (x, y)
(I)→(x,,(I)) (3)
In the above mapping, the first two components indicate local information and the semi-local information is included in the form of a patch Pxy(I). From the theory of differential geometry, the area of an element on the surface Mxy(I) is defined as
The partial derivatives ∂P/∂x and ∂P/∂y can be computed using forward differences since the image is discrete. Using this area definition, the rate of change of the area of a surface element is defined as
In the regions where texture is present, the intensity variations in the local neighborhood cause the corresponding surface to change with different G values. For images with more than one channel, (4) can be extended to support multiple channels. Let I=(I1, I2, . . . , Id), where d represents the number of channels. From formula (3), the corresponding manifold based representation can be modified as Mxy(I)→(x, y, Pxy(I1), . . . , Pxy(Id)). The final rate of area change for a multi-channel image becomes:
The texture descriptor T is finally defined as T=exp(−G2). After applying the texture descriptor T to the input image I, the resulting single channel image can be denoted by It={tij:(i,j)∈Ω}. In a given textured region, pixel T values within that region will be similar. Across the image there may be different texture regions, where the value of T may change. In addition to the Gaussian smoothing, the semi-local nature of this texture descriptor over multiple channels makes it more robust to the noise compared to other texture descriptors [22].
While one example of use of augmentation has been described, the skilled reader will understand that other methods of image augmentation may be suitable. In particular, methods that augment the CT data in order to improve segmentation may be used. Augmentation with texture information may be used. Augmentation with horizontal and/or vertical gradient information may be used.
Returning to
Feature vectors give a meaningful and quantitative representation to superpixels. Having discriminative feature vectors for the foreground and background can lead to accurate segmentation. As mentioned before, colour histograms or mean intensities might not be able to distinguish between foreground and background objects with similar visual properties or cause the curse of dimensionality issue. To avoid such situations, the Applicant's method may use a bag of features based approach to construct feature vectors of uniform or fixed length [25].
In the pooling stage (Stage 2,
J{acute over ( )}egou et al. [26] have shown that bag of features can have multiple variations based on how the encoding and pooling processes are performed. For example, encoding can be done using different clustering techniques such as k-means clustering, fuzzy clustering or gaussian mixture models, etc. The simplest form of pooling is constructing a histogram using the information from the codebook.
In one proposed method, Vector of Locally Aggregated Descriptors (VLAD) framework is used to do encoding and pooling [27], [26].
In VLAD, encoding involves using k-means clustering to generate the codebook C. This codebook is the basis for constructing global descriptors for all the superpixels (labelled and unlabelled included). During pooling, let each superpixel feature vector be represented by V={V(iK+j):i=0 . . . K−1, j=0 . . . d−1}, where i and j indexes the codebook and the local descriptor component respectively. In this case, local descriptor components are equal to the number of channels in the augmented image. Hence, each component of V is represented as
where pj and ci,j respectively denote the jth component of the local descriptor p and its corresponding nearest visual word denoted by the nearest neighbour (NN).
The resulting feature vector V is L2-normalized (V:=V/∥V∥2), and then subsequently power normalized (Vi:=sign(νi)√{square root over (|νi|)}) to enhance superpixel specific components, and to unsparsify the descriptors respectively [28]. While calculating the feature vectors, there is no hard and fast rule on how to select the number of clusters (K). However, using large values of K makes the descriptors unnecessarily large as well as sparse.
As stated earlier, in our proposed approach the segmentation problem may be seen as a classification problem. Feature vectors corresponding to the labelled superpixels may be used to train a SVM classifier 106 [29]. The SVM may be a linear SVM. In general, any suitable machine learning based classifier, including any suitable supervised learning model and in particular any SVM may be used, including a multiclass SVM classifier, One-vs-One SVM or One-vs-Rest SVM.
In a standard approach, the classifier gets as many feature vectors as the number of labelled superpixels for training. However, in the current method, the number of feature vectors generated per superpixel may be increased to artificially produce more training data. Generation of feature vectors may be accomplished by randomly sampling pixels from each superpixel to generate a feature vector. This procedure may be performed several times on each superpixel to get multiple feature vectors per superpixel. All the feature vectors generated from a single superpixel share the same label. The random sampling may be done with replacement, so the total data from which the sampling is made is constant every time. This way, more data can be augmented for better classification accuracy. This process of increasing the number of training samples is also called data augmentation [30]. One of the reasons multiple feature vectors are generated from a single superpixel through random sampling is because of the fact that superpixels are nearly homogeneous over all the pixels within.
Once the training is over, the same codebook C is used to generate feature vectors for all the unlabelled superpixels. The trained SVM classifier uses these feature vectors to propagate labels to the entire image. Typically, this is how classification may be done to assign labels to the unlabelled samples. However, to give each superpixel its local context, the feature vectors may be modified as a weighted average of neighbourhood superpixels.
Feature vector ‘V(i)’ for a superpixel ‘i’ (Si) is defined as
where, wi completes the weighted average. Ni is the set of neighbouring superpixels of Si, wrij is a range based weight based on the difference in mean intensities of the superpixels, and wdij is a domain based weight based on spatial distance.
σd, σr are smoothing parameters, M(⋅) denotes the mean intensity of the superpixel, and dist(i, j) is the Euclidean distance between the centroids (spatial) of superpixels i and j.
Once classification is performed, a few post-processing steps are needed to fine tune the result.
Once superpixels are classified into foreground and background using SVM, all the pixels within a superpixel are assigned the same class as the superpixel they belong to. This process is repeated for all the superpixels within the image to label/segment the entire image 107.
During one image segmentation process, all the pixels in the output image belonging to the same class are assigned the same intensity value (or some other distinction between the pixel classes may be made). For a two-class problem, this means that the resulting image is a binary image. However, this result may need some post-refinement 108 due to superpixel misclassification or imperfections during superpixel generation. The superpixel misclassification results in foreground objects to have background patches and vice versa due to similar visual properties the foreground and background objects share. To reduce this problem, connected component analysis may be used to discard such regions. It only preserves the regions with strokes, and other misclassified regions may be discarded, and merged into the background/foreground.
After the post-refinement, the end result may be displayed to the user for the feedback. If the user is satisfied with the generated segmentation result, segmentation is complete. If not, the user may give additional input strokes to refine the result. Once the additional strokes are given, the entire process may be repeated to segment the image using the old and new strokes.
During superpixel classification, there may be some false positives with similar visual properties. Connected component analysis is used to discard such region. Connected component analysis only preserves the regions with strokes, and other misclassified regions are discarded and merged into the background.
The example method described above generates feature vectors using bag of features and then segments the image with the help of these feature vectors and a classifier. This approach may work with any number of channels in a multi-energy CT system.
Results for two datasets are reported below, each with different amounts of noise. The performance of the method described is compared against state-of-the-art methods. These methods include Random Walk (RW) segmentation [3], Graph Cuts (GC) [4], Geodesic Star Convexity (GSC) prior for segmentation [6], Geodesic Graph Cuts (GGC) [7], Grabcut [5], and Maximal Similarity based Region Merging (MSRM) [8].
The following parameters were used in the current approach: the square patch size for calculating texture (n)=7×7, the number of visual words for VLAD (K)=16, and the range and domain based smoothing parameters (σr and σd) were set to 1.8 and 1.3 respectively.
The same user input (as a set of input strokes) was provided to all methods in order to avoid any variations caused by input discrepancy. While additional strokes could improve the performance of either of the methods, they would also require additional user interaction, hence the analysis was carried out with a single fixed set of input strokes from the user. In all figures, input images have white and red coloured strokes to represent foreground and background objects respectively. The segmentation results from each of the tested method used in this paper were shown using blue contours. Similarly, green colour was chosen to denote the manually annotated ground truth. The ground truths for these datasets were manually annotated by members of a clinical team.
For performance analysis, the first dataset used in our work is of a lamb chop (from now on referred to as the “lamb dataset”). This is an early multi-energy CT dataset and a challenging one to work with, due to the noise and the low contrast between lipid and water. The volume dimensions are 436×436×126 voxels, the cubic voxels are 0.093 mm wide, and four frames to represent four energy channels. In the lamb dataset, we chose three classes: lipid- like, water-like, and background. In the following text, the experiments demonstrate two cases in which 1) lipid-like objects are labelled as foreground and the rest as background, 2) water-like objects are labelled as foreground and the rest as background. Since the algorithm is mainly designed to work for 2D slices, the testing was done on multiple individual slices separately.
In
The bottom row of
It can be observed that the boundaries have a slight block-like appearance at some places. The reason for their block-like appearance is due to the fact that SLIC starts off with square patches as superpixels and adjust their shape according to the nearby image content. Since the image in question doesn't have many features, most of the superpixels stayed square-like. After some of the superpixels were misclassified, these block boundaries become visible.
The second dataset used for our performance evaluation is of a knee doped with Hexabrix (from now on referred to as the “knee dataset”). Its volume dimensions are 1280×1280×192 voxels, the cubic voxels are 0.07 mm wide, and five energy channels. In this dataset, the object of interest is bone. Segmenting a bone might seem a trivial task, but it has its own challenges as will be demonstrated in
For this dataset, two cases are demonstrated to compare how the methods perform at different noise levels. The top row of
Even in these cases, the methods described herein produce images closer to the ground truth when compared to other methods.
The average performance of the proposed approach on the two datasets was quantified using accuracy, F1-score, and Intersection over Union (IoU) as statistical accuracy metrics. The manually delineated and segmented regions are denoted by I and G respectively.
Accuracy is defined as a ratio of the number of correctly classified pixels to all the pixels within an image. It is the most commonly used metric for performance evaluation, which indicates the degree of similarity between the segmentation result and their respective manually annotated ground truth.
Another similarity index used for this evaluation is called F1-score (the Dice's coefficient). Given I and G, it is defined as;
where |I| and |G| are the cardinalities of the sets I and G respectively (i.e. the number of elements in each set). F1-score or the Dice's coefficient not only measures the similarity between I and G, but also considers the foreground and background class balance. It computes the ratio of the overlapping region between the ground truth foreground and the foreground segmented by the algorithm to the combined foreground size.
The overlapping area of segmented and ground truth is quantified using IoU or the Jaccard measure. It is defined as the ratio of the number of pixels within the area of intersection to the number of pixels within the union of regions.
Table I outlines the average performance of the algorithms for a given input over the two datasets. All the algorithms were given the same input except grabcut, which takes a bounding box as an input.
The bold representation of the score indicates the best performance in each metric. Performance wise, RW comes close to the proposed approach, but still has a significant difference. Another thing to notice here is that the standard deviation, which reveals the consistency of the results over large number of images. The proposed method has a relatively small standard deviation compared to the other methods, which indicates its stability over different images.
The computer implemented methods of the present invention are outlined as method 200 in
In use with method 200 of
Processor 304 receives the image data set from communications network 302 and implements the processing steps of the methods of the present invention together with data storage medium 305 and user input received from interface 307. Server 306 serves and receives data between computer system 303 to user interface 307.
As with system 300 computer system 603 includes processor 604, data storage medium 605 and server 606. Processor 604 receives the image data set from communications network 602 and implements the processing steps of the methods of the present invention together with data storage medium 605 and user interface 607. Server 606 serves data to and from computer system 603 to interface 607.
The various operations of methods described above may be performed by any suitable means capable of performing the operations, such as various hardware and/or software component(s), circuits, and/or module(s). Generally, any operations illustrated in the Figures may be performed by corresponding functional means capable of performing the same or equivalent operations.
The various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, modules, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. The described functionality may be implemented in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the invention.
The steps of a method or algorithm and functions described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a tangible, non transitory computer-readable medium.
A software module may reside in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, hard disk, a removable disk, a CD ROM, or any other form of storage medium known in the art.
A storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
Combinations of the above should also be included within the scope of computer readable media. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
The examples above relate primarily to segmentation into two regions (which may be considered ‘foreground’ and ‘background’ or may be given some other label). However, the invention may also extend to segmentation into more than two regions. For example, this may be achieved by a user identifying three or more regions, labelling the superpixels accordingly, and segmenting by similar methods to those described above. Alternatively, the process may be repeated with different foreground targets each time to ultimately yield three or more regions.
Further, while the invention has been described primarily in relation to a two dimensional ‘slice’ of multi-energy CT data, it may be possible to extend the method to segmentation of three dimensional data. For example, an initial segmentation on 2D data may be extended into 3D data. Alternatively, multiple segmentations may be performed in different 2D ‘slices’ before extension into 3D data. Alternatively, the method may be performed on voxels and supervoxels rather than pixels and superpixels. In general, pixels and voxels may be considered data elements, while superpixels and supervoxels may be considered clusters of data elements (notwithstanding that the smallest possible superpixel is a single pixel, and the smallest possible supervoxel is a single voxel).
Extending the above example to supervoxels may introduce complexity that significantly slows the process. Repeating the above process slice by slice may therefore be a preferred approach to segmentation in three dimensions.
Further, if a 2D segmentation is to be extended to 3D, the initial 2D segmentation may be an interactive segmentation based on user-input markings, strokes or similar. Segmentations of further slices or otherwise extending the 2D segmentation into 3D may be performed without further user-input of markings/strokes etc. This may be done using the labels and vectors already determined, or further strokes, markings etc may be automatically created.
The segmented data resulting from the Applicant's process may be displayed to a user on any suitable display.
The segmented data may also be applied to the original data to extract all pixels belonging to a class into its own image. For example, if the original image is of a patient's chest, the segmentation result may be applied to create a new image with just one structure of interest (e.g. the heart) and a second new image with everything except that structure of interest (e.g. the heart).
Alternatively, the segmentation data may be fed directly into visualisation/analysis algorithms to include or exclude parts of the data.
Annotations and/or metadata for the original image may be created based on the regions specified by the segmentation results.
The segmentation results, and/or any derivative images/data/analysis results may be saved to any suitable storage.
While described primarily in relation to multi-energy CT scanners, the invention may also be applicable in other scanners, including scanners that provide multi-energy CT data, such as hybrid scanners (e.g. MRI and multi-energy CT) or phase-contrast CT scanners providing multi-energy CT data,
The entire disclosures of all applications, patents and publications cited above and below, if any, are herein incorporated by reference.
Reference to any prior art in this specification is not, and should not be taken as, an acknowledgement or any form of suggestion that that prior art forms part of the common general knowledge in the field of endeavour in any country in the world.
Where in the foregoing description reference has been made to integers or components having known equivalents thereof, those integers are herein incorporated as if individually set forth.
It should be noted that various changes and modifications to the presently preferred embodiments described herein will be apparent to those skilled in the art. Such changes and modifications may be made without departing from the spirit and scope of the invention and without diminishing its attendant advantages. It is therefore intended that such changes and modifications be included within the present invention.
Number | Date | Country | Kind |
---|---|---|---|
762875 | Mar 2020 | NZ | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/NZ2021/050046 | 3/22/2021 | WO |