This invention relates generally to computer vision, and more particularly to image segmentation.
In computer vision, segmentation refers to the process of partitioning a digital image into multiple regions, i.e., disjoint sets of pixels. The goal of the segmentation is to simplify and/or change the representation of the image into something that is more meaningful and easier to analyze. Segmentation is typically used to locate objects and boundaries in images. The result of image segmentation is a set of regions, or a set of contours extracted from the image.
However, automatic segmentation of an object in an image is challenging in the presence of image noise, background clutter and occlusions. In semi-automatic segmentation, a user specifies a region of interest (ROI), and segmentation methods are applied such that a countour that best fits the object in ROI in the image is determined.
A random walk (RW) is a mathematical formalization of a trajectory that includes taking successive random steps. Specific cases or limits of random walks include the drunkard's walk and Lévy flight. Random walks are related to diffusion models and are a fundamental topic in discussions of Markov processes. Properties of random walks, including dispersal distributions, first-passage times and encounter rates are well known.
An image can be segmented using random walk segmentation by solving a sparse system of linear equations, Grady: “Random walks for image segmentation,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 28:1768-1783, 2006.
Grady also describes the incorporation of prior information into the random walk segmentation, Grady, “Multilabel random walker image segmentation using prior models,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2005. The prior information used is color prior probabilities in the form of a Gaussian mixture learned from training data.
However, the color prior probabilities do not always produce satisfactory result for the random walk segmentation, e.g., in the presence of image noise, background clutter and occlusions. Thus, it is desired to use the random walk segmentation with other types of prior information.
It is an object of the invention to provide a method for segmentation using spatial random walks segmentation.
The invention is based on the realization that a segmented object is conformed to a spatial constraint, and incorporating a spatial constraint into the spatial random walks segmentation leads to a superior segmentation results.
Therefore, embodiments of our invention incorporates a shape prior into the random walks segmentation method. Using the shape prior representation and associated with the shape prior distance measures, we are segmenting objects of complex shapes even in the presence of image noise, background clutter and occlusion. Furthermore, some embodiments of the invention selects the shape prior from multiple shape priors for particular segmentation.
We first obtain an initial segmentation from any conventional segmentation methods and align the shape prior with the initial segmentation. Once aligned, we segment iteratively the image using a spatial random walk based on a shape prior of the previous segmented region to produce a next segmented region. We compare the next segmented region with the previous segmented region, and repeat the segmenting and the comparing until the previous and next segmented regions converge. After that, we select the next segmented region as a final segmented region.
The image 110 is segmented 120 to produce a segmented region 130. For example, in one embodiment we use a random walk segmentation for the initial segmentation of the image based on a seed 125. The seed is selected, by a user specifying a region of interest (ROI) in the image. A Laplacian matrix 117 of the image 110 is computed 115 and provided as an input to the spatial random walk segmentation 150. The image 110 with the final segmented region 190 is an output of the method 100.
The embodiments of the invention iteratively segment 150 the image the with spatial random walk segmentation based on the shape prior 145 to produce a next segmented region 155 until 160 the next segmented region converges with the previous segmented region 130. After the converging, we select the next segmented region as the output segmented region 190.
The shape prior is a predetermined shape of a region to be segmented, i.e., the segmented region 190. In one embodiment, the shape prior has a different scale or orientation as the region 190. Only a general similarity is required. For some applications the region to be segmented is known, e.g., we are segmenting an eye in an image of a face. However, the shape prior can have an arbitrary form. For example, if the region 190 is a tumor to be segmented on a scan image, the shape prior can be acquired from previous tumor segmentations. Furthermore, if the method is used for tracking an object, the prior shape can be received from previous tracking results. In alternative embodiment, the shape prior 145 is selected 200 from a set of shapes priors 210, as described below.
In some embodiments, before the shape prior is provided to the spatial random walk segmentation, we first align 140 the shape prior with the previous segmented region 130, and next smooth 300 the aligned shape prior 143. Hence, a smoothed aligned version 310 of the shape prior is provided to the segmentation.
In one embodiment, we smooth the aligned shape prior adaptively, based on a value of the interaction index 147, as described in greater details below.
If another iteration of the segmenting 150 is necessary 167, then the previous segmented region is replaced 170 with the result of the segmentation, i.e., the next segmented region. Accordingly, we repeat the alignment, increase an iteration index 147, and repeat the segmenting 150.
In one embodiment, we use the original shape prior 145 for the alignment. However, in another embodiment, the aligned shape prior 143 determined during the previous iteration of the segmentation is used for the alignment 140.
Selecting Shape Prior
In one embodiment, the similarity score is determined by comparing pixel intensities. We select 240 the shape prior 145 having a highest similarity score from the set of similarity scores.
Alignment
In one embodiment, we align the shape prior to the current segmentation results using a difference between affine transformed versions of the shape prior on the segmented region. We select the transformation that gives the minimum difference. Because an exhaustive search is time consuming, some embodiments use fast search, such as first aligning the center of masses of the shape priors and the segmented region, and then solving for the rotation and scale, etc. One embodiment uses a RANdom SAmple Consensus (RANSAC) based alignment. In alternative embodiment, we use image normalization that is invariant to translation, rotation, scaling and skew.
Smoothing
In one embodiment, parameters of the smoothing function are adjusted depending on the iteration index. As the next segmentation converges to the given shape, the value of the shape prior in the special random walk segmentation is increased, as well to enforce better shape matching.
Converging means that the next segmented region is similar to the previous regions. In other words, the segmentation has stabilized. In one embodiment, we allow minor changes due to oscillations.
Random Walk Segmentation
The segmentation method of the seeded image uses a random walk starting from each pixel of the input image until a labeled seed pixel is reached. This is equivalent to minimization of the Dirichlet integral:
for a field u and a region Ω. The Euler-Lagrange equation for Equation (2) is the Laplace transform:
Ñ2u=Δu=divgradu=0, (3)
where div stands for divergence and grad stands for gradient. Based on the definition of harmonic functions, the solution that minimizes Equation (2) is a harmonic function because harmonic functions satisfy the Laplace transform in Equation (3).
The problem of finding the harmonic function u subject to boundary values is “Dirichlet boundary problem.” In the context of image segmentation, the Dirichlet boundary problem is defined on a graph G={V, E} including vertices V connected by edges E. Each vertex vi represents a pixel i in the image, and an edge eij represents the connection between vertices vi and vj according to an adjacency operator. A corresponding weight wij of the edge represents a strength of the connection between two vertices and introduces bias into the random walk.
The combinatorial formulation of Equation (2) is
where r is a harmonic function in the image domain, and L is a combinatorial divergence and gradient (div grad) operator. L is the Laplacian matrix in which each coefficient corresponds to a pixel, and indicates a likelihood of the pixel is being from a class
where
if eijεE. Hence, the solution of the Dirichlet boundary problem is
Lr=0 (6)
The Laplacian matrix L has a rank of at most N−1, where N is the number of pixels in the image. As a result, Equation (6) is singular and is solved, for example, by providing labels for some of the pixels, which makes Equation (6) conditioned.
We partition the function r as rM for labeled pixels, i.e., seeds and rU for unlabeled pixels. By reordering the entries in the matrix L, we rewrite Equation (4) as
By taking derivatives of (rU) with respect to rU and zero, we obtain the following system of linear equations:
LUrU=−BTrM, (7)
wherein B is portion of the matrix L, T is a transpose operator. The solution of Equation (7) is a sparse, symmetric, positive-definite.
A function ris assigns a probability to vertex ri for taking on label s, where sε{1, 2, . . . , K}. Given that rM for the labeled pixels have values of 0 or 1, the rM on the right-hand side of Equation (7) can be replaced by the 0 or 1 of matrix M, where each row of M is a length K indicator vector:
Therefore, for label s, the solution for xs is obtained by solving:
and for all labels:
LUR=−BTM.
Let dim(•) denotes the dimensionality of a given matrix, we have dim(LU)=NU×NU, dim(R)=NU×K, dim(B)=NM×NU and dim(M)=NM×K, where NU represents the number of unmarked pixels, NM is the number of marked pixels and therefore N=NU+NM. Because ris is the probability of vertex vi taking on label s, ri satisfies the following condition:
only K−1 sparse linear systems need to be solved.
Spatial Random Walk Segmentation
The spatial random walk segmentation uses a linear equation 157
(L+v)r=vH(φ0),
wherein L is a Laplacian matrix, v is the weight of the shape prior during the segmentation, H(•) is the smoothing function, φ0 is the shape prior, and r is a vector describing the next segmented region.
Each coefficient in the Laplacian matrix corresponds to a link between a pair of pixels. The value of the coefficient is based on the application.
In one embodiment, if two pixels are not adjacent, then the coefficient is zero. Otherwise, the coefficient has a value derived from intensity values of the two adjacent pixels. The values of diagonal coefficients are the negative of the sum of the coefficients corresponding to the adjacent pixels. The Laplacian matrix, sometimes called admittance matrix or Kirchhoff matrix, is a matrix representation of a graph.
Shape Prior
In an energy minimization framework, image segmentation with shape prior is formulated as an energy function with the shape prior coded as a regularization term. The random walk in Equation (4) minimizes an original energy Thus, the general regularization framework for incorporating the shape prior is
=+v (8)
where describes how the segmentation matches with the prior shape, and v is a weight parameter that controls the impact of the shape prior into the spatial random walk. In one embodiment, the weight parameter is correlated with the iteration of the segmentation 150, the iteration index 147. Increase of the iteration index increases the weight parameter.
In one embodiment, we describe the shape prior energy Esp as a shape distance function.
Thus, the shape prior energy is:
where φ is a level set function of the segmentation, φ0 is a level set function of the prior shape, xεΩ, and Ω is the domain of integration. H(•) is the unit step side function. We replace H(φ) with a harmonic function r and use a smoothed function for the shape prior φ0
where ε here is a smoothing factor.
In the image domain, we rewrite Equation (9) as:
where i is an index of a pixel.
To minimize the energy functional in Equation (8), we solve the linear equation 157:
(L+n)r=nH(f0). (12)
The Equation (12), in one embodiment, is solved using inverse operations and/or least-squares solutions. However, a solution of the Equation (12) requires proper alignment of the prior shape φ0 to the image, which in turn requires the initialization of r, as described above.
The weight parameter v is selected empirically. However, to favor the shape more as opposed to the intensity/appearance information in the consecutive iterations rather than the initial iterations, in some embodiments, the weight value is increased with the iteration index.
Example Embodiment
In one embodiment, we use of an image color intensity prior probability information in conjunction with the shape prior information. Therefore, our energy term takes the form:
=+v+γ (13)
To model Ecolor, we use a normalized histogram from the Gaussian distribution kernels for foreground and background seeds. The pixelwise prior probabilities are simply obtained for each unlabeled pixel. Ecolor in Equation (13) leads to faster convergence then using the shape prior alone. Hence, the linear system according to this embodiment is:
(L+v+γ)rsγλUs+vH(φ′0)−BTms, (14)
where s refers to label s, e.g., foreground or background. The weight parameter γ is selected empirically to accelerate convergence.
In the shape prior, the smoothing factor ε in H(•) is adjusted during each iteration based on the iteration index. As the segmentation converges to the given shape, the value ε approached to 1 to enforce shape matching. The weight parameter v is selected empirically.
The weights wij are constructed conventionally:
wij=exp−β(pi−pj)2,
where pi,pj refer to the i-th and j-th pixel values respectively, and β is the variance for controlling the magnitude of the weights. The value β can be adjusted for different images. In general, if the image contrast is strong, then a smaller β is preferred to emphasize the smoothness of color/intensity variation among the neighboring pixels.
Effect of the Invention
The embodiments of the invention yield improved image segmentation over the conventional segmentation methods.
The conventional random walk is a general purpose segmentation method. However, in the presence of background clutter or in the absence of image intensity/color variation, the segmentation obtained might not be useful if the seeds are misplaced.
However, in many practical applications, the prior information is available and can be exploited to improve the segmentation result. For example,
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
6031935 | Kimmel | Feb 2000 | A |
7680312 | Jolly et al. | Mar 2010 | B2 |
20020184470 | Weese et al. | Dec 2002 | A1 |
20030099391 | Bansal et al. | May 2003 | A1 |
20030198382 | Chen et al. | Oct 2003 | A1 |
20050238215 | Jolly et al. | Oct 2005 | A1 |
20060050959 | Grady et al. | Mar 2006 | A1 |
20060120524 | Avidan et al. | Jun 2006 | A1 |
20060159343 | Grady | Jul 2006 | A1 |
20060239553 | Florin et al. | Oct 2006 | A1 |
20060251302 | Abufadel et al. | Nov 2006 | A1 |
20060285745 | Paragios et al. | Dec 2006 | A1 |
20070014473 | Slabaugh et al. | Jan 2007 | A1 |
20070025616 | Grady et al. | Feb 2007 | A1 |
20080037871 | Sinop et al. | Feb 2008 | A1 |
20090052763 | Acharyya et al. | Feb 2009 | A1 |
Number | Date | Country | |
---|---|---|---|
20100246956 A1 | Sep 2010 | US |