The present invention relates 3D object detection in images, and more particularly, to automated Ileo-Cecal Valve (ICV) detection in colon CT data using incremental parameter learning.
Detecting and segmenting human anatomic structures in 3D medical image volumes (e.g., CT, MRI, etc.) is a challenging problem, which is typically more difficult than detecting anatomic structures in 2D images. Human anatomic structures are highly deformable by nature, leading to large intra-class variation in the shape, appearance, and pose (orientation) of such structures in 3D medical images. Furthermore, the pose of an anatomic structure is typically unknown in advance of detection. If the pose of an anatomic structure were known prior to detection, it would be possible to train a model for the same category of anatomic structure with a fixed pose specification and pre-align all testing data with the known pose information to evaluate their fitness against the learned model. However, in order to determine the pose configuration of an anatomic structure, the structure itself must be first detected, because pose estimation is only meaningful where the structure exists. Accordingly, a method for simultaneous detection and registration of 3D anatomic structures is need.
Many three dimensional (3D) detection and segmentation problems are confronted with searching in a high dimensional space. For example, a 3D similarity transformation is characterized by nine parameters: three position parameters, three orientation parameters, and three scale parameters. It is very expensive to search the entire space for detection of an object. The search for all these parameters becomes computationally prohibitive, even if coarse-to-fine strategies are involved.
The Ileo-Cecal Valve (ICV) is a small anatomic structure connecting the small and large intestines in the human body. The normal functionality of the ICV (opening and closing on demand) allows food to pass into the large intestine (i.e., colon) from the small intestine. The ICV being stuck in either the open or closed position can cause serious medical consequences. Furthermore, detecting the ICV in 3D computed tomography (CT) volumes is important for accurate colon segmentation and for distinguishing false positives from polyps in colon cancer diagnosis. The size of the ICV is sensitive to the weight of the patient and whether the ICV is healthy or diseased. Because the ICV is part of the colon, which is highly deformable, the position and orientation of the ICV can vary greatly. Due to large variations in the position, size, and orientation of the ICV, detecting the ICV in CT volumes can be very difficult. Accordingly, a method for automatically detecting the size, position, and orientation of the ICV is needed.
The present invention addresses 3D object detection in images. Embodiments of the present invention are directed to automatic Ileo-Cecal Valve (ICV) detection in 3D computed tomography (CT) images. The detection method of the present invention allows full 9 degrees-of-freedom (DOF) of searching to locate object with optimal configurations (3D for translation, 3D for rotation, and 3D for scale).
In one embodiment of the present invention, an incremental parameter learning method is used for ICV detection in 3D CT volumes. A 3D training CT volume is received. A first classifier is trained which generates a number of ICV position box candidates for the 3D training CT volume from a set of initial ICV box candidates. A second classifier is trained which generates a number of ICV position and scale box candidates for the 3D training CT volume from the classifier-verified ICV position box candidates. A third classifier is trained which detects a position, scale, and orientation of a 3D box bounding the ICV in the 3D training volume from the classifier-verified ICV position and scale box candidates. An orifice classifier can also be trained which generates a number of orifice candidate surface voxels from the 3D training CT volume, and an initial orientation classifier can be trained which generates the set of initial ICV box candidates from the orifice candidate voxels.
In another embodiment of the present invention, ICV detection in a 3D CT image can be performed by detecting initial box candidates for the ICV based an ICV orifice, and detecting a box bounding the ICV in the 3D CT volume by sequentially detecting possible locations, scales, and orientations of the box bounding the ICV using incremental parameter learning based on the initial box candidates. In order to detect the initial box candidates, a number of ICV orifice candidate voxels can be detected in the 3D CT volume using a trained 3D point detector. An orientation of a 3D box centered at each orifice candidate voxel can be aligned with a gradient vector at that orifice candidate voxel, and testing boxes can be generated by rotating the orientation of the 3D box centered at each orifice candidate voxel inside the orthogonal plane of the corresponding gradient vector. A number of the testing boxes can be detected as initial box candidates using a trained 3D box detector. In order to detect of the box bounding the ICV using incremental parameter learning, first testing boxes can be generated by shifting a center location of each initial box candidate and a number of first testing boxes can be detected as ICV position box candidates using a first trained classifier. Second testing boxes can be generated by varying a scale of each ICV position box candidate, and a number of second testing boxes can be detected as ICV position and scale box candidates using a second trained classifier. Third testing boxes can be generated by adding disturbances to an orientation of each ICV position and scale box candidate, and one of the third testing boxes can be detected as the box bounding the ICV in the 3D CT volume.
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The present invention is directed to a method for 3D object detection and registration in images. A particular embodiment of the present invention is directed to Ileo-Cecal Valve (ICV) detection in computed tomography (CT) image volumes. Embodiments of the present invention are described herein to give a visual understanding of the ICV detection method. A digital image is often composed of digital representations of one or more objects (or shapes). The digital representation of an object is often described herein in terms of identifying and manipulating the objects. Such manipulations are virtual manipulations accomplished in the memory or other circuitry/hardware of a computer system. Accordingly, is to be understood that embodiments of the present invention may be performed within a computer system using data stored within the computer system.
According to an embodiment of the present invention, incremental parameter learning is used for simultaneous detection and registration of a 3D object in a 3D image (e.g., CT image, MRI image, etc.). Incremental parameter learning is based on a sequence of binary encodings of projected true positives from labeled objects in a set of training data. That is, global optima in the global space are enforced to be projected optima in the corresponding projection subspaces for each parameter via encoding. The encoding is performed using an iterative learning method. At each step of encoding, new object samples are extracted by scanning the object's configuration parameter in the current learning subspace, based on the detected candidate hypotheses from the preceding step. The distances from extracted samples to their corresponding spatially labeled objects (as ground truth to be learned) are then utilized to separate these training samples into positive and negative sets. This ensures that the projection of the global optima be optima in each subspace during training, so that the projections of global optima can be sequentially detected through the subspaces in testing, and finally the global optima can be recovered as a result. This process is repeated until the full object configuration parameter spaces are explored. The encoding process can be binary classification problem, which can be implemented using a probabilistic boosting tree (PBT) algorithm.
Ω1:{ΩT}⊂Ω2:{ΩT, ΩS}⊂Ω3:{ΩT, ΩS, ΩR} (1)
where Ω3=Ω, or more generally
Ω1⊂Ω2⊂ . . . ⊂Ωn=Ω (2)
In equations (1) and (2), the order of ΩS and ΩR can be switched, but ΩT should be the first parameter learned. This is because the object's size and orientation can be only be optimized at a location at which the object is found.
For training, a set of 3D objects are labeled in training data volumes with bounding boxes {T, S, R}. This set of 3D objects is represented by parameter box 102 in
dist((Ti, S*, R*), (Tt, St, Rt))=∥Ci−Ct∥ (3)
where Ci is the geometrical center of the sampling box (Ti, S*, R*) and Ct is the geometrical center of the ground truth box (Tt, St, Rt).
The box samples {(T1, S*, R*); (T2, S*, R*); . . . ; (Tn, S*, R*)} for each data volume are then divided into a positive training set ΦT+ (114) or a negative training set ΦT− (116) based on the distance metric dist((T, S*, R*), (Tt, St, Rt)). In particular, the box samples {(T1, S*, R*); (T2, S*, R*); . . . ; (Tn, S*, R*)} are divided into the positive training set ΦT+ (114) if
dist((Ti, S*, R*), (Tt, St, Rt))<θ1 (4)
and the negative training set ΦT− (116) if
dist((Ti, S*, R*), (Tt, St, Rt))>θ2 (5)
where θ2>θ1.
The positive training set ΦT+ (114) and the negative training set ΦT− (116) are used by a boosting based probabilistic binary learner 118, such as a probabilistic boosting tree (PBT), to train a first classifier PT. Steerable features are calculated from each 3D bounding box and its corresponding data volume for the PBT training. Based on the steerable features, the first classifier PT can determine a probability for sampled (in training) or scanned (in testing) object boxes. The first classifier PT determines higher positive-class probability values (close to 1) for boxes which are close to their respective labeled object boxes and lower values (close to 0) for boxes that are distant from their respective labeled object boxes. Once the first classifier PT is trained, the first classifier PT is used to classify the sampled box candidates {(T1, S*, R*); (T2, S*, R*); . . . ; (Tn, S*, R*)}, and the top M candidates are retained as {(T1′, S*, R*); (T2′, S*, R*); . . . ; (Tm′, S*, R*)} with the highest output probabilities. If there is only one existing object per volume and the training function is perfectly learned by a classifier, M=1 is sufficient to achieve the correct detection. In practice, it is possible to set M=50˜100 for all intermediate detection steps to improve robustness. Accordingly, multiple detected hypotheses can be maintained until the final result.
The M intermediate detections (candidates) resulting from the first classifier PT are used as a basis for step 120. At step 120, each candidate (Ti′, S*, R*), i=1, 2, . . . , M is augmented as n samples: {(Ti′, S1, R*); (Ti′, S2, R*); . . . ; (Ti′, Sn, R*)}. Accordingly, for each candidate (Ti′, S*, R*), i=1, 2, . . . , M, ΩS is searched by scanning n samples for the size, while the orientation parameter ΩR is set to the means value R*. This is shown at box 122 of
dist((Ti′, Sj, R*), (Tt′, St, Rt))<τ1 (6)
and a negative training set ΦS− (126) if
dist((Ti′, Sj, R*), (Tt′, St, Rt))>τ2 (7)
for i=1, 2, . . . , M and j=1, 2, . . . , n. dist((Ti′, Sj, R*), (Tt′, St, Rt)) is defined as a box-to-box distance function which formulates 3D box differences in both ΩT and ΩS. More generally, such a box-to-box distance function can be expressed as:
where v1i is one of eight vertices of box1 and v2i is the corresponding vertex of box2. ∥v1i−v2i∥ is the Euclidean distance between two 3D vectors v1i and v2i.
The positive training set ΦS+ (124) and the negative training set ΦS− (126) are used by PBT learner 128 to train a second classifier PS, based on steerable features that are calculated from each of the candidate boxes. Once the second classifier PS is trained, the second classifier PS is used to classify the M×n box candidates {(Ti′, Sj, R*)}, i=1, 2, . . . , M; j=1, 2, . . . , n, and the top M candidates with the highest output probabilities are retained. These candidates are denoted as {(Ti′, Sj′, R*)} i=1, 2, . . . , M.
The M intermediate detections (candidates) resulting from the second classifier Ps are used as a basis for step 130. At step 130, each candidate (Ti′, Si′, Rj), i=1, 2, . . . , M is further expanded in ΩR by scanning n samples for the orientation for each candidate, resulting in M×n box candidates {(Ti′, Si′, Rj)}, i=1, 2, . . . , M; j=1, 2, . . . , n. This is shown at box 132 of
In testing, three searching steps are used to sequentially search in ΩT, ΩS, and ΩR in order to detect the 3D object in an unlabeled data volume. In each step, 3D box candidates which are close to the global optimum (i.e., the objects true spatial configuration) can be scanned and searched in the current parameter subspace (ΩT→ΩS→ΩR), using the learned models (classifiers) PT, PS, and PR, respectively. The output candidates are used as seeds of propagation in the next stage of the incremental parameter optimization, and the testing leads to the optimized spatial configuration of the 3D object in the data volume.
The ICV detection method of detects the spatial configuration (i.e., center position, scale, and orientation) of a 3D box bounding the ICV in a CT volume. Referring to
At step 302, candidate points are detected for an orifice of the ICV using a 3D point detector. The orifice is part of the anatomy of the ICV. If the ICV's orifice can be found, its position in ΩT will be well-constrained where no exhaustive searching of position is need. The ICV orifice has a distinct shape profile which allows efficient detection using a 3D point detector. As described above, a 3D point detector involves less feature computation (5751 vs. 52185 for training) than a 3D box detector for direct ICV detection. Furthermore, it is known that the ICV orifice only lies on the colon surface. Thus, it is possible to prune all voxel locations inside the tissue or in the air for faster scanning.
Returning to
Stage 305 of the method of
At step 306, ICV position box candidates are detected from the initial box candidates using a 3D box detector. In order to detect the position box candidates, the position of each of the initial box candidates can be shifted to every one voxel in a range, such as [−20,20], of all X, Y, and Z coordinates (i.e., ΩT+ΔT). This set of synthesized ICV box samples is then split into positive (<θ1) and negative (>θ2) training sets for the PBT training the first classifier PT using distance thresholding. For example, the distance thresholds θ1 and θ2 can be implemented as θ1=5 voxels and θ2=25 voxels, but the present invention is not limited thereto. The classifier PT is then trained using PBT based on the box-level steerable features, and used to classify the ICB box samples generated for each initial box candidate. The top M (e.g., 100) candidates in each CT volume are maintained as the ICV position box candidates.
At step 308, ICV position and scale box candidates are detected from the ICV position box candidates using a 3D box detector. The size parameter of each ICV position box candidate resulting from step 306 is varied evenly in ΩS to generate box samples for each ICV position box candidate. For example, the size parameter can be varied evenly by 2 voxel intervals from the range of [23,51] voxels in the X direction, [15,33] voxels in the Y direction, and [11,31] voxels in the Z direction. The ranges can be statistically calculated from the annotated ICV dataset. Using the box-level steerable features and PBT, the second classifier Ps is trained using distance thresholding. For example distance thresholds of τ1=4 and τ2=20 can be used, but the present invention is not limited thereto. The second classifier PS is used to classify the generated box samples, and the top M are maintained as the ICV position and scale candidates.
At step 310, the position, scale, and orientation of the ICV is detected from the ICV position and scale box candidates using a 3D box detector. In this step, box samples are generated by adaptively adding disturbances to the previously aligned (in step 304) orientation parameters of the ICV position and scale box candidates (i.e., ΩR+ΔR). For example, ΔR can vary with 0.05 intervals in [−0.3,0.3], 0.1 in ([−0.9,−0.3), (0.3,0.9]) and 0.3 in ([−1.8,−0.9), (0.9,1.8]). This provides a finer scale of searching when closer to the current orientation parameters (retained from PR′ in step 304), to improve the ΩR detection accuracy. Distance thresholding is used to divide the box samples into positive and negative training sets. For example, distance thresholds of η1=4 and η2=15 can be used, but the present invention is not limited thereto. The third classifier PR is then trained using the box-level steerable features and PBT, and PR is used to classify the generated box samples. The box candidate (sample) with the highest probability value form PR is output as the final ICV detection result. Accordingly, the 9D spatial configuration (position, scale, and orientation) of a box bounding the ICV is given by the position, scale, and orientation parameters of the final ICV detection result.
The ICV detection result can be output by storing the detection result on memory or storage of a computer system, displaying the detection result on an image of the CT volume, etc. The ICV detection can be used to reduce false positives in colon polyp detection. For example, the ICV detection result can be used as a post filter for a colon polyp classification system.
The above-described methods for 3D object detection, and in particular, ICV detection in CT volumes, may be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. A high level block diagram of such a computer is illustrated in
The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.
This application claims the benefit of U.S. Provisional Application No. 60/887,895, filed Feb. 2, 2007, the disclosure of which is herein incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
6496184 | Freeman et al. | Dec 2002 | B1 |
6829384 | Schneiderman et al. | Dec 2004 | B2 |
7308134 | Wersing et al. | Dec 2007 | B2 |
7809177 | Yoshida et al. | Oct 2010 | B2 |
20020159627 | Schneiderman et al. | Oct 2002 | A1 |
20020165837 | Zhang et al. | Nov 2002 | A1 |
20030035573 | Duta et al. | Feb 2003 | A1 |
20060074834 | Dong et al. | Apr 2006 | A1 |
20060110029 | Kazui et al. | May 2006 | A1 |
20060224539 | Zhang et al. | Oct 2006 | A1 |
20070036429 | Terakawa | Feb 2007 | A1 |
20070073114 | Gundel | Mar 2007 | A1 |
20070140541 | Bae et al. | Jun 2007 | A1 |
20080005180 | Kunze | Jan 2008 | A1 |
20080085050 | Barbu et al. | Apr 2008 | A1 |
20080211812 | Barbu et al. | Sep 2008 | A1 |
20090304251 | Zheng et al. | Dec 2009 | A1 |
20100040272 | Zheng et al. | Feb 2010 | A1 |
20100067764 | Lu et al. | Mar 2010 | A1 |
20100074499 | Wels et al. | Mar 2010 | A1 |
20100080434 | Seifert et al. | Apr 2010 | A1 |
20100142787 | Zheng et al. | Jun 2010 | A1 |
20110063288 | Valadez | Mar 2011 | A1 |
Number | Date | Country | |
---|---|---|---|
20080211812 A1 | Sep 2008 | US |
Number | Date | Country | |
---|---|---|---|
60887895 | Feb 2007 | US |