Sequential segmentation of anatomical structures in 3D scans

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a 371 National Stage Application of PCT/EP2019/063745, filed May 28, 2019, which claims the benefit of European Application No. 18176521.5, filed Jun. 7, 2018, the contents of which are incorporated by reference herein in their entirety.

FIELD OF THE INVENTION

The present invention relates to a method of segmenting anatomical structures in volumetric scans, more specifically in medical 3D scans.

Background of the Invention

Accurate segmentation of anatomical structures in volumetric medical scans is of high interest in current clinical practice as it plays an important role in many tasks involved in computer-aided diagnosis, image-guided interventions, radio-therapy and radiology. In particular, quantitative diagnostics requires knowledge of accurate boundaries of anatomical organs.

3D deep learning segmentation approaches show promising results on various organs and modalities. Like the work of Lu et al. [11] on liver segmentation, most of these methods are built upon the 3D U-Net architecture [5]. Dou et al. [6] presented a 3D fully-convolutional architecture which boosts liver segmentation accuracy by deep supervision layers. Yang et al. [16] used adversarial training in order to gain more performance for the 3D U-Net segmentation of the liver in CT scans. Sekuboyina et al. [14] proposed a pipeline approach for both localization and segmentation of the spine in CT. Here the vertebrae segmentation is performed in a blockwise manner to overcome memory limitations and at the same time obtain a fine-grained result. A similar blockwise approach in combination with a multi-scale two-way CNN was introduced by Korez et al. [9].

Aforementioned methods resample scans originating from 3D image acquisition into volumes and apply convolutions in 3D. This involves usually downscaling in order to overcome memory limitations. Therefore, methods that process volumetric data in a slice-wise fashion gained importance. For instance, Li et al. [10] applied first a slice-wise densely connected variant of the U-Net [13] architecture for liver segmentation and refined the result by a 3D model using the auto-context algorithm. For the same task, Christ et al. [4] applied slice-wise U-Nets to obtain a rough segmentation and refined the result with Conditional Random Fields in 3D.

For proper leveraging spatial information, relying only on intra-slice data is insufficient. Addressing this issue, the above-mentioned methods applied computationally expensive 3D refinement strategies in addition to the main 2D approach.

On contrary, Bates et al. [2] and Chen et al. [3] show that combining recurrent neural networks with 2D fully convolutional approaches allows to process spatial information over slices directly. The presented methods use a U-Net variant to yield an initial segmentation, which subsequently is refined by convolutional Long Short Term Memory Networks (LSTMs) [7].

It is an aspect of the present invention to provide an improved segmentation method.

SUMMARY OF THE INVENTION

The above-mentioned aspects are realized by a method as set out in claim 1.

Specific features for preferred embodiments of the invention are set out in the dependent claims.

The method of this invention is a general, robust, end-to-end hybrid of the U-Net, time-distributed convolutions and bidirectional convolutional LSTM blocks.

The method overcomes disadvantages of current methods such as disregarding anisotropic voxel sizes, considering only intra-slice information, and memory constraints.

Furthermore, invariance to field-of-view is shown, especially to constant slice count by evaluating on two CT datasets depicting two different organs, namely liver and vertebrae. Liver segmentation is often a required step in the diagnosis of hepatic diseases while the segmentation of vertebrae is important for the identification of spine abnormalities, e.g. fractures, or image-guided spine intervention.

Further advantages and embodiments of the present invention will become apparent from the following description and drawings.

BRIEF DESCRIPTION OF THE D WINGS

FIG. 1 shows an overview of the architecture underlying the method of the present invention,

FIG. 2 is a table with detailed information on filters and shapes for input and output tensors,

FIG. 3 are the results of two-fold evaluations on vertebrae and liver,

DETAILED DESCRIPTION OF THE INVENTION

Methodology

General Setup

Let I={I₁, . . . I_n} a set of volumetric scans, where each I_iconsists of voxels (x₁,x₂,x₃)∈ custom character ³and intensities I_i(x)∈J⊂.

More specifically, each scan I_iis therefore a set of m_I_i∈N slices

$J_{I_{i}} = {J_{1}, \dots J_{m_{I_{i}}}}$

with pixels y=(y₁,y₂)∈ custom character ²and intensities J_I_i(y)∈J.

For each slice J_i∈J_I_ia set of ground truth masks are available

M_J_i:=M_J_i_,l)_l=1^m

where l corresponds to semantic class labels custom character ={l₁. . . l_m} and M_J_i,∈⊂M^m¹^×m²({0,1}), the space of all binary matrices of size m_i×m₂.

To enforce reproducibility of the input flow shapes, a new training dataset was built in the following way.

From each scan I_i∈I a subset J_I_i′⊂J_I_iof size k_J_iis selected where k_J_iis dividable by an odd number O_J_i∈N.

Subsequently, each J_I_i′ is split into order-preserving subsets of the length O_J_i, and all are combined into a new dataset I′ that now consists of the subsets of odd cardinality.

For training and evaluation purposes, the dataset I′ is split into non-overlapping sets, namely I_Train′ and I_Test′.

During training the network is consecutively passed with minibatches custom character ∈, where is a complete partition of the set I_Train′.

For each sequence I_j′∈I′, i.e. I_j′={J_p, . . . J_q} for some 1≤p,q≤|J_I_i′| and some sequence I_i, the multi-class output of the network is calculated:

understanding the network as a function custom character : I′→ (1)
(I_j′) derives for each pixel y∈J_pits semantic class l∈ in a single step with some probability, where J_pcorresponds to the middle element of the sequence I_j′.

In order to estimate and maximize this probability, a loss function is defined as follows

Λ:I′×→ (2)

that estimates the deviation (error) of the network outcome from the desired ground truth.

Using the formal notations derived by Novikov et al. [12] a loss function is defined in the following way.

For a distance function d:I′×→, weighting coefficients τ_κ,land a sequence S∈K the loss function is

Λ(S,G_S):=−Σ_l∈r_κ,l⁻¹d_l^dice(S,G_S) (3)

over the set K and the complete partition.

The distance function d_l^dicefor the Dice coefficient for a training sequence S, a feature channel l, ground-truth mask G_Sand sigmoid activation function p_l(.) can be defined as

$d_{l}^{dice} (S, G_{S}) := 2 \frac{Σ_{x \in I} χ_{π_{l} (G_{S})} (x) p_{i} (x)}{Σ_{x \in I} (χ_{π_{l} (G_{s})} (x) + p_{l} (x))}$

Where X_π_l_(G_S₎(x) is a characteristic function, i.e. X_π_l_(G_S₎(x)=1 if G_Sis 1 at the position of pixel x and 0 otherwise.

Architecture

In order to leverage spatio-temporal correlations of the order-preserving slices (elements of I_j′) and due to their sequential nature, time-distributed convolutions and bidirectional convolutional LSTM blocks have been combined in an end-to-end trainable U-Net-like hybrid architecture.

FIG. 1 shows a high-level overview of the proposed architecture.

FIG. 2 complements FIG. 1 with tensor shapes for each layer.

The network takes an odd-lengthed sequence J_i∈I_j′ as the input.

This sequence is then passed to a contraction block. Each element of the sequence is processed through the contraction block of the network independently. The contraction block consists of the repeated time-distributed convolution and max-pooling layers.

Time-distributed convolutions are typical convolutions passed to a special wrapper. Time-distributed wrappers allow application of any layer to every temporal frame (or slice) of the input independently. In the context of this work such temporal frames correspond to the elements of the training sequences extracted from the volumes. In the present architecture the wrapper is applied in all convolutional, pooling, upsampling and concatenation layers.

In order to capture spatio-temporal correlations between slices the features extracted for each element of the input sequence have been passed into a Convolutional LSTM (CLSTM) block [15] at the end of the contraction part of the network.

A bidirectional modification of the CLSTM is used with a summation operator in order to enable the network learning spatio-temporal correlations of the slices in both directions.

This CLSTM block aims at adding the dependency of the low-dimensional high abstract features.

The sequence output of the bidirectional CLSTM block is then passed to the expansion path. Each element of the sequence is processed independently.

The expansion part consists of the time-distributed convolutions and up-sampling layers. After every up-sampling layer, the features are concatenated with the corresponding features from the contraction part. When the spatial resolution of the features reaches the desired sizes, the sequence is passed to another bidirectional CLSTM block. The sequence is processed in both directions and the output is summed. Therefore this block contributes towards two goals: adding dependency for the high-dimensional features and converting the incoming sequence into a single-channeled output.

The resulting features are then passed to the (1,1) convolution layer in order to map each feature vector to the desired number of classes. The result is mapped into [0,1] range via the sigmoid activation applied to each pixel independently. This results in the segmentation of the middle element of the input sequence J_i.

Experimental Setup

Training Data and Preparation

To demonstrate the generalizability of this architecture on different anatomical organs public datasets were used for vertebrae and liver.

For vertebrae segmentation the CSI 2014 challenge train set [17] was used. It comprises of 10 CT scans covering the entire lumbar and thoracic spine and full vertebrae segmentation masks for each scan. The axial in-plane resolution varies between 0.3125 and 0.3616 mm2 and the slice thickness is 1 mm.

For liver segmentation the following two related datasets were used: 3Dircadb-01 and 3Dircadb-02 [1]. The first consists of 20 3D CT scans of 10 and 10 men with hepatic tumours in 75% cases. The second one consists of two anonymized scans with hepatic focal nodular hyperplasia. The axial in-plane resolution varied between 0.56 and 0.961 mm⁻²and the slice thickness varied between 1.0 and 4.0 mm.

The consecutive elements within the training sequences for the vertebrae segmentation were generated at the distance of 3 mms and at 5 mms for liver segmentation within the vertebrae and liver areas.

These numbers were chosen based on the maximal slice thicknesses in the datasets. In all evaluations we used the sequences of three slices (oJ_i==3, ∀J_i∈_j′).

Evaluations on both organs were performed in a two-fold manner. All slices and masks were downsampled to 128×128 imaging resolution for timely evaluation runs.

Implementation Details

All experiments were performed using Keras with TensorFlow backend in Python. We trained the networks over the loss shown by Eq. 3 using an ADAM [8] optimization algorithm with a fixed initial rate of 10-5 and the standard values of β1=0.9 and β2=0.999. Downsampling of the ground-truth masks was performed using the scikit-image library.

Results

The average Dice and Jaccard scores for two folds for vertebrae and liver segmentation are shown in FIG. 3. The first two columns show the scores for the case when the whole volume was considered, even when the organ of interest was not present. The third and fourth columns show the results only for the regions containing the organ of interest.

The focus of this invention is on the general segmentation approach therefore the final result is not post-processed because usually this procedure is task-specific and requires organ-specific tuning. In this case, as expected, for this context irrelevant bone structures were partially segmented thus lowering the scores for the full volume.

In order to demonstrate that the present network improves over its 2D variations, two more architectures were built and evaluated in the same training conditions on the Fold 1 for liver.

In the first architecture the input was changed in a way that it would take the sequences consisting of only one slice (the middle slice we aimed to segment).

In the second architecture the input dimensions were not changed but the first convolutional LSTM block was removed and replaced the second one by the aggregation layer which would sum the incoming features over the time channel.

Both architectures achieved similar Dice scores of 0.87 and 0.878 when considering the organ area only, which is significantly lower than the scores of the network of this invention.

REFERENCES

1. 3d-ircadb database. Available at https://www.ircad.fr/research/3d-ircadb-01/.

2. R. Bates, B. Irving, B. Markelc, J. Kaeppler, R. Muschel, V. Grau, and J. A. Schnabel. Extracting 3D vascular structures from microscopy images using convolutional recurrent networks. arXiv preprint arXiv:1705.09597, 2017.

3. J. Chen, L. Yang, Y. Zhang, M. Alber, and D. Z. Chen. Combining fully convolutional and recurrent neural networks for 3D biomedical image segmentation. In NIPS, pages 3036-3044, 2016.

4. P. F. Christ, M. E. A. Elshaer, F. Ettlinger, S. Tatavarty, Bickel, P. Bilic, M. Rempfler, M. Armbruster, F. Hofmann, M. DAnastasi, et al. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In MICCAI, pages 415-423. Springer, 2016.

5. Ö Cicek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In MICCAI, pages 424-432, Cham, 2016. Springer International Publishing.

6. Q. Dou, H. Chen, Y. Jin, L. Yu, J. Qin, and P.-A. Heng. 3D deeply supervised network for automatic liver segmentation from CT volumes. In MICCAI, pages 149-157. Springer, 2016.

7. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.

8. D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.

9. R. Korez, B. Likar, F. Pernu{hacek over ( )} s, and T. Vrtovec. Segmentation of pathological spines in CT images using a two-way CNN and a collision-based model. In International Workshop and Challenge on Computational Methods and Clinical Applications in Musculoskeletal Imaging, pages 95-107. Springer, 2017.

10. X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P. A. Heng. H-DenseUNet: Hybrid densely connected U-Net for liver and liver tumor segmentation from CT volumes. arXiv preprint arXiv:1709.07330, 2017.

11. F. Lu, F. Wu, P. Hu, Z. Peng, and D. Kong. Automatic 3D liver location and segmentation via convolutional neural network and graph cut. International Journal of CARS, 12(2):171-182, 2017.

12. A. A. Novikov, D. Lenis, D. Major, J. Hlad ̊uvka, M. Wimmer, and K. Bühler. Fully convolutional architectures for multi-class segmentation in chest radiographs. IEEE Transactions on Medical Imaging, PP(99):1-1, 2018.

13. O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234-241. Springer, 2015.

14. A. Sekuboyina, J. Kukacka, J. S. Kirschke, B. H. Menze, and A. Valentinitsch. Attention-driven deep learning for pathological spine segmentation. In International Workshop and Challenge on Computational Methods and Clinical Applications in Musculoskeletal Imaging, pages 108-119. Springer, 2017.

15. X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In NIPS, NIPS, pages 802-810. MIT Press, 2015.

16. D. Yang, D. Xu, S. K. Zhou, B. Georgescu, M. Chen, S. Grbic, D. Metaxas, and D. Comaniciu. Automatic liver segmentation using an adversarial image-to-image network. In MICCAI, pages 507-515. Springer, 2017.

17. J. Yao, J. E. Burns, H. Munoz, and R. M. Summers. Detection of vertebral body fractures based on cortical shell unwrapping. In MICCAI, pages 509-516, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.

Having described in detail preferred embodiments of the current invention, it will now be apparent to those skilled in the art that numerous modifications can be made therein without departing from the scope of the invention as defined in the appending claims.

Number	Name	Date	Kind
20180240235	Mazo	Aug 2018	A1
20180260951	Yang	Sep 2018	A1
20190021677	Grbic	Jan 2019	A1
20190205606	Zhou	Jul 2019	A1
20190223725	Lu	Jul 2019	A1
20210241884	Swisher	Aug 2021	A1

Sequential segmentation of anatomical structures in 3D scans

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications

Term Extension

Abstract

Description

Claims

Priority Claims (1)

PCT Information

US Referenced Citations (6)

Non-Patent Literature Citations (6)

Related Publications (1)

Entry
Chen Yani et al, “Accurate and Consistent Hippocampus Segmentation Through Convolutional LSTM and View Ensemble”, International Workshop on Machine Learning in Medical Imaging, Sep. 7, 2017, pp. 88-96, Lecture Notes in Computer Science vol. 10541, Springer, Cham. (Year: 2017).
International Search Report dated Jun. 21, 2019 relating to PCT/EP2019/063745, 3 pages.
Written Opinion dated Jun. 21, 2019 relating to PCT/EP2019/063745, 6 pages.
Avinash Sharma V, “Understanding Activation Functions in Neural Networks”, The Theory of Everything, Mar. 30, 2017, https://medium.com/the-theory-of-everything/understandingactivation-functions-in-neural-networks-9491262884e0.
Chen Yani et al, “Accurate and Consistent Hippocampus Segmentation Through Convolutional LSTM and View Ensemble”, International Workshop on Machine Learning in Medical Imaging, Sep. 7, 2017, pp. 88-96, Lecture Notes in Computer Science vol. 10541, Springer, Cham.
Olaf Ronneberger et al., “U-Net: Convolutional Networks for Biomedical Image Segmentation”, Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Computer Science vol. 935, May 18, 2015, pp. 234-241, Springer, Cham.