The present invention is directed to a system and method for segmenting chambers of a heart in a three dimensional image, and more particularly, to a system and method for segmenting chambers of a heart in a three dimensional image using anatomical structure localization and boundary delineation.
Cardiac computed tomography (CT) is an important imaging modality for diagnosing cardiovascular disease and it can provide detailed anatomic information about the cardiac chambers, large vessels or coronary arteries. Multi-chamber heart segmentation is a prerequisite for global quantification of the cardiac function. The complexity of cardiac anatomy, poor contrast, noise or motion artifacts makes this segmentation a challenging task. Most known approaches focus on only left ventricle segmentation. Complete segmentation of all four heart chambers can help to diagnose diseases in other chambers, e.g., left atrium fibrillation, right ventricle overload or to perform dyssynchrony analysis.
There are two tasks for a non-rigid object segmentation problem: object localization and boundary delineation. Most of the known approaches focus on boundary delineation based on active shape models, active appearance models, and deformable models. There are a few limitations inherent in these techniques: 1) Most of them are semiautomatic and manual labeling of a rough position and pose of the heart chambers is needed. 2) They are likely to get stuck in local strong image evidence. Other known techniques are straightforward extensions of two dimensional (2D) image segmentation to three dimensional (3D) image segmentation. The segmentation is performed on each 2D slice and the results are combined to get the final 3D segmentation. However, such techniques cannot fully exploit the benefit of 3D imaging in a natural way.
Object localization is required for an automatic segmentation system and discriminative learning approaches have proved to be efficient and robust for solving 2D problems. In these methods, shape detection or localization is formulated as a classification problem: whether an image block contains the target shape or not. To build a robust system, a classifier only has to tolerate limited variation in object pose. The object is found by scanning the classifier over an exhaustive range of possible locations, orientations, scales or other parameters in an image. This searching strategy is different from other parameter estimation approaches, such as deformable models, where an initial estimate is adjusted (e.g., using the gradient descent technique) to optimize a predefined objective function.
Exhaustive searching makes the system robust under local minima, however there are two challenges to extend the learning based approaches to 3D. First, the number of hypotheses increases exponentially with respect to the dimensionality of the parameter space. For example, there are nine degrees of freedom for the anisotropic similarity transformation, namely three translation parameters, three rotation angles and three scales. If n discrete values are searched for each dimension, the number of tested hypotheses is n9 (for a very coarse estimation with a small n=5, n9=1,953,125). The computational demands are beyond the capabilities of current desktop computers. Due to this limitation, previous approaches often constrain the search to a lower dimensional space. For example, only the position and isotropic scaling (4D) is searched in the generalized Hough transformation based approach. The second challenge is that efficient features are needed to search the orientation and scale spaces. Haar wavelet features can be efficiently computed for translation and scale transformations. However, when searching for rotation parameters, one either has to rotate the feature templates or rotate the volume which is very time consuming. The efficiency of image feature computation becomes more important when combined with a very large number of test hypotheses. There is a need for an approach for detecting shapes in high dimensional images that is efficient and less computationally intensive.
A system and method for segmenting chambers of a heart in three dimensional images is disclosed. A set of three dimensional images of a heart is received. The shape of the heart in the three dimensional images is localized. Boundaries of the chambers of the heart in the localized shape are identified using steerable features.
Preferred embodiments of the present invention will be described below in more detail, wherein like reference numerals indicate like elements, with reference to the accompanying drawings:
a and 5b illustrate sampling features in steerable features in accordance with an aspect of the present invention;
a-6c illustrate examples of non-rigid deformation estimation for a left ventricle;
a-7c illustrate a triangulated heart surface model of various heart chambers; and
The present invention is directed to a system and method for multi-chamber heart segmentation.
The acquisition device 105 may be a computed tomography (CT) imaging device or any other three-dimensional (3D) high-resolution imaging device such as a magnetic resonance (MR) scanner or ultrasound scanner.
The PC 110, which may be a portable or laptop computer, a medical diagnostic imaging system or a picture archiving communications system (PACS) data management station, includes a central processing unit (CPU) 125 and a memory 130 connected to an input device 150 and an output device 155. The CPU 125 includes a multi-chamber heart segmentation module 145 that includes one or more methods for segmenting a heart in three dimensional medical images to be discussed hereinafter. Although shown inside the CPU 125, the multi-chamber heart segmentation module 145 can be located outside the CPU 125.
The memory 130 includes a random access memory (RAM) 135 and a read-only memory (ROM) 140. The memory 130 can also include a database, disk drive, tape drive, etc., or a combination thereof. The RAM 135 functions as a data memory that stores data used during execution of a program in the CPU 125 and is used as a work area. The ROM 140 functions as a program memory for storing a program executed in the CPU 125. The input 150 is constituted by a keyboard, mouse, etc., and the output 155 is constituted by a liquid crystal display (LCD), cathode ray tube (CRT) display, printer, etc.
The operation of the system 100 can be controlled from the operator's console 115, which includes a controller 165, e.g., a keyboard, and a display 160. The operator's console 115 communicates with the PC 110 and the acquisition device 105 so that image data collected by the acquisition device 105 can be rendered by the PC 110 and viewed on the display 160. The PC 110 can be configured to operate and display information provided by the acquisition device 105 absent the operator's console 115, by using, e.g., the input 150 and output 155 devices to execute certain tasks performed by the controller 165 and display 160.
The operator's console 115 may further include any suitable image rendering system/tool/application that can process digital image data of an acquired image dataset (or portion thereof) to generate and display images on the display 160. More specifically, the image rendering system may be an application that provides rendering and visualization of medical image data, and which executes on a general purpose or specific computer workstation. The PC 110 can also include the above-mentioned image rendering system/tool/application.
In accordance with an embodiment of the present invention, a system and method for automatic segmentation of 3D cardiac images is disclosed. Learned discriminative object models are used to exploit a large database of annotated 3D medical images. The segmentation is performed in two steps: anatomical structure localization and boundary delineation. Marginal Space Learning (MSL) is used to solve the 9-dimensional similarity search problem for localizing the heart chambers. MSL reduces the number of testing hypotheses by about six orders of magnitude as will be described in greater detail hereinafter. Steerable image features which incorporate orientation and scale information into the distribution of sampling points allow for the avoidance of time-consuming volume data rotation operations. After determining the similarity transformation of the heart chambers, the 3D shape is estimated through learning-based boundary delineation.
b) shows a view of the heart that has the left atrium (LA) 206, a portion of RV 210 and the LV 208.
MSL incrementally learns classifiers on projected sample distributions. As the dimensionality increases, the valid or positive space region becomes more restricted by previous marginal space classifiers. The estimation is split into three problems: translation estimation, translation-orientation estimation, and full similarity estimation.
Another advantage to using MSL for shape detection is that different features or learning methods can be used at each step. For example, in the translation estimation step rotation is treated as an intra-class variation so 3D Haar features can be used for detection. In the translation-orientation and similarity transformation estimation steps, steerable features are used which will be described in further detail hereinafter. Steerable features have a very flexible framework in which a few points are sampled from the volume under a special pattern. A few local features are extracted for each sampling point, such as voxel intensity and gradient. To evaluate the steerable features under a specified orientation, only the sampling pattern needs to be steered and no volume rotation is involved.
After the similarity transformation estimation, an initial estimate of the non-rigid shape is obtained. Learning based 3D boundary detection is used to guide the shape deformation in the active shape model framework. Again, steerable features are used to train local detectors and find the boundary under any orientation, therefore avoiding time consuming volume rotation.
In many instances, the posterior distribution or the object to be detected, e.g., heart, is clustered in a small region in the high dimensional parameter space. It is not necessary to search the whole space uniformly and exhaustively. In accordance with an embodiment of the present invention, an efficient parameter search method, Marginal Space Learning (MSL), is used to search such clustered space. In MSL, the dimensionality of the search space is gradually increased. For purposes of explanation, Q is the space where the solution to the given problem exists and PΩ is the true probability that needs to be learned. The learning and computation are performed in a sequence of marginal spaces
Ω1⊂Ω2⊂ . . . ⊂Ωn=Ω (1)
such that Ω1 is a low dimensional space (e.g., a 3D translation instead of a 9D similarity transformation), and for each k,dim(Ωk)-dim(Ωk-1) is small. A search in the marginal space Ω1 using the learned probability model finds a subspace π1⊂Ω1 containing the most probable values and discards the rest of the space. The restricted marginal space Π1 is then extended to Π1l=Π1×X1⊂Ω2. Another stage of learning and testing is performed on Π1l obtaining a restricted marginal space Π2⊂Ω2 and the procedure is repeated until the full space Ω is reached. At each step, the restricted space Πk is one or two orders of magnitude smaller than it Πk-1×Xk. This results in a very efficient method with minimal loss in performance.
Global features, such as 3D Haar wavelet features, are effective to capture the global information (e.g., orientation and scale) of an object. Prealignment of the image or volume is important for a learning based approach. However, it is very time consuming to rotate a 3D image, so 3D Haar wavelet features are not efficient for orientation estimation. Local features are fast to evaluate but they lose the global information of the whole object. A new framework, steerable features, can capture the orientation and scale of the object and at the same time is very efficient.
A few points from the image are sampled under a special pattern. A few local features are extracted for each sampling point, such as voxel intensity and gradient.
For each sampling point, a set of local features are extracted based on the intensity and the gradient. For example, given a sampling point (x, y, z), if its intensity is I and the gradient is g=(gx, gy, gz), the following features are used: I, √{square root over (I)}, I2, I3, log I, gx, gy, gz, ∥g∥, ,∥g∥2,∥g∥3, log∥g∥, . . . , etc. In total, there are 24 local features for each sampling point. If there are P sampling points (often in the order of a few hundreds to a thousand), a feature pool containing 24×P features is obtained. These features are used to train simple classifiers and a Probabilistic Boosting Tree (PBT) is used to combine the simple classifiers to get a strong classifier for the given parameters.
Instead of aligning the volume to the hypothesized orientation to extract Haar wavelet features, the sampling pattern is steered, hence the name “steerable features”. In the steerable feature framework, each feature is local, therefore efficient. The sampling pattern is global to capture the orientation and scale information. In this way, it combines the advantages of both global and local features.
In accordance with an embodiment of the present invention, a method for segmenting chambers of a heart using a 3D object localization scheme using MSL and steerable features will be described with reference to
Given a set of candidates, they are split into two groups, positive and negative, based on their distance to the ground truth (step 804). The error in object position and scale estimation is not comparable with that of orientation estimation directly. Therefore, a normalized distance measure is defined using the searching step size.
where Vil is the estimated value for dimension i and Vit is the ground truth. A sample is regarded as a positive one if E≦1.0 and all the others are negative samples. The searching step for position estimation is one voxel, so a positive sample (X, Y, Z) should satisfy
max{|X−Xi|,|Y−Yi|,|Z−Zi|}≦1 voxel (3)
where (Xi, Yi, Zi) is the ground truth of the object center.
Given a set of positive and negative training samples, 3D Haar wavelet features are extracted (step 806) and a classifier is trained using the probabilistic boosting tree (PBT) (step 808). Given a trained classifier, a training volume is scanned and a small number of candidates (e.g., 100) are preserved such that the solution is among top hypotheses.
An example of how position-orientation and similarity transformation estimators are trained (step 810) will now be described. For a given volume, 100 candidates (Xi, Yi, Zi), i=1, . . . , 100 exist for the object position. The position and orientation are then estimated. The hypothesized parameter space is six dimensional so the dimension of the candidates needs to be augmented. For each candidate of the position, the orientation space is scanned uniformly to generate the hypotheses for orientation estimation. It is well known that the orientation in 3D can be represented as three Euler angles, ψ, φ, and θ. The orientation space is scanned using a step size of 0.2 radians (i.e., 11 degrees). It is to be understood by those skilled in the art that a different scanning rate can be used without departing from the scope or spirit of the present invention.
For each candidate (Xi, Yi, Zi), it is augmented with N (e.g., about 1000) hypotheses about orientation, (Xi, Yi, Zi, ψj, φj, θj), j=1 . . . N. Some hypotheses are close to the ground truth (positive) and others are far away (negative). The learning goal is to distinguish the positive and negative samples using image features, i.e., steerable features. A hypothesis (X, Y, Z, ψ, φ, θ) is regarded as a positive sample if it satisfies both Eq. 3 and
max{|ψ−ψi|,|φ−φi|,|θ−θi|}≦0.2 (4)
where (ψi, φi, θi) represent the orientation ground truth. All the other hypotheses are regarded as negative samples.
Since aligning 3D Haar wavelet features to a specified orientation is not efficient, steerable features are used in the following steps. A classifier is trained using PBT and steerable features. The trained classifier is used to prune the hypotheses to preserve only a few candidates.
The similarity estimation step in which scale is added is similar to the position-orientation transformation except learning is performed on the full nine dimensional similarity transformation space. The dimension of each candidate is augmented by scanning the scale subspace uniformly and exhaustively.
Now the testing procedure on an unseen volume will be described. The input volume is first normalized to 3 mm isotropic resolution, and all voxels are scanned using the trained position estimator. A predetermined number of top candidates (e.g., 100), (Xi, Yi, Zi), i=1 . . . 100, are kept. Each candidate is augmented with N (e.g., 1000) hypotheses about orientation, (Xi, Yi, Zi, ψj, φj, θj), j=1 . . . N. Next, the trained translation-orientation classifier is used to prune these 100×N hypotheses and the top 50 candidates are retained, ({circumflex over (X)}i, Ŷi, {circumflex over (X)}i, {circumflex over (ψ)}i, {circumflex over (φ)}i, {circumflex over (θ)}i), i=1 . . . 50. Similarly, each candidate is augmented with M (also about 1000) hypotheses about scaling and use the trained classifier to rank these 50×M hypotheses. The goal is to obtain a single estimate of the similarity transformation. In order to aggregate the multiple candidates, averaging of the top K (K=100) is performed.
In terms of computational complexity, for translation estimation, all voxels are scanned (about 260,000 for a small 64×64×664 volume at the 3 mm resolution) for possible object position. There are about 1000 hypotheses for orientation and scale each. If the parameter space is searched uniformly and exhaustively, there are about 2.6×1011 hypotheses to be tested. By using MSL, only 4.1×105 (260,000+100×1000+50×1000) hypotheses are tested and the amount of testing is reduced by almost six orders of magnitude.
After the first stage, the position, orientation and scale of the object are obtained. The mean shape is aligned with the estimated transformation to get a rough estimate of the object shape.
A set of local boundary detectors are trained using the steerable features with the regular sampling pattern (as shown in
An example will now be described which demonstrates the performance of the proposed method for multi-chamber localization and delineation in cardiac CT images. As shown in
Having described embodiments for a system and method for segmenting chambers of a heart in a three dimensional image, it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments of the invention disclosed which are within the scope and spirit of the invention as defined by the appended claims. Having thus described the invention with the details and particularity required by the patent laws, what is claimed and desired protected by Letters set forth in the appended claims.
This application claims the benefit of U.S. Provisional Application Ser. No. 60/827,235, filed Sep. 28, 2006 and Ser. No. 60/913,353, filed Apr. 23, 2007 which is are incorporated by reference in its their entirety.
Number | Name | Date | Kind |
---|---|---|---|
7792342 | Barbu et al. | Sep 2010 | B2 |
20030187358 | Okerlund et al. | Oct 2003 | A1 |
20050031202 | Accomazzi et al. | Feb 2005 | A1 |
20050281447 | Moreau-Gobard et al. | Dec 2005 | A1 |
20080101676 | Zheng et al. | May 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20080101676 A1 | May 2008 | US |
Number | Date | Country | |
---|---|---|---|
60827235 | Sep 2006 | US | |
60913353 | Apr 2007 | US |