The invention relates generally to data segmentation, and more particularly to segmenting pixels in images.
Data segmentation is used extensively in many computer applications. In computer vision, the segmentation operates on 2D images of pixels or 3D volumetric data of voxels. For it segmentation x, a spectral segmentation method with multi-scale graph decomposition minimizes xTAx/(xTDgx), where A is an affinity matrix, Dg is a diagonal matrix, and T is a transform operator. Some methods treat image segmentation as a graph partitioning problem where a normalized cut criterion measures a dissimilarity between different group of pixels and a similarity within the groups. Random walk is a seeded segmentation method that determines the probability that a walk starting at each unlabeled pixel y(i, j) reaches prelabeled pixels by solving a closed form equation using a graph Laplacian where weights A(i,j)=exp(yi,yj)2/θ, and θ is a global scaling factor, see e.g., U.S. Pat. Nos. 7,286,127, 7,692,664.
A matting Laplacian matrix can be derived from multiple matte equations. In comparison with the random walk and normalized cuts, that method adapts a correlation measure instead of an exponent of color distance, and a local scaling, instead of global, scaling, and formulate a least square solution with constraints from user input. Local scaling leads to better clustering, especially when the data include multiple scales and the clusters are placed within a cluttered background.
A structure of eigenvectors can be analyzed to infer automatically the number of groups, instead of increases in eigenvalue magnitudes. Another method uses a dark channel prior to model the thickness of haze and apply the matting Laplacian to refine a transmission map.
The embodiments of the invention provide a method for segmenting n-dimensional data, for example, two-dimensional (2D) data that represent pixels in one or more image acquired by a sensor. The data can also be 3D, such as volumetric data obtained from medical, or geological scans. Higher dimensional data can also be segmented.
The method identifies target data, e.g., pixels or voxels of interest that are associated with ‘foreground’ regions in the images. The method uses a graph Laplacian spectrum constraint to incorporate point-wise scalar prior vectors for the binary segmentation. Prior vectors align a rough, incomplete, or noisy initial segmentation, e.g., a foreground mask, a saliency map, a defocus field, or an object detection window, to a preferred structures in the n-dimensional data, e.g., to object boundaries or gradients. The segmentation uses an objective function.
Alternative embodiments include projection to a null-space, a convex function with l2-norm, a convex function with l1-norm, a sparse decomposition, or a robust function, known as a Welsch function in the art of robust statistics.
Specifically, a method segments n-dimensional by first determining prior information from the data. A fidelity term is determined from the prior information, and the data are represented as a graph.
A graph Laplacian is determined from the graph from the graph, and a Laplacian spectrum constraint is determined from the graph Laplacian. Then, an objective function is minimized according to the fidelity term and the Laplacian spectrum constraint to identify a segment of target points in the data.
Segmentation Method
The embodiments of our invention provide a method for segmenting n-dimensional data. The data can be acquired by a sensor or constructed by some other means. In an example application, a binary segmentation locates areas of interest in one or more images by partitioning a foreground region and detecting an object surface. In one embodiment, the object is a human organ, and the images provide volumetric data, e.g., as acquired by medical imaging.
The method uses a graph Laplacian spectrum constraint to impose structure and point-wise constraints during the segmentation. As known in the art and used herein, the Laplace operator is the second order differential operator defined as the divergence of the gradient in a Euclidean space. It is understood that the method can segment any n-dimensional data.
As show in
We represent the n-dimensional data y with a graph G, and determine 120 a graph Laplacian L, which is used to determine 140 a Laplacian spectrum constraint ∥Lx∥ 104. The fidelity term and the spectrum constraint are used to optimize an objective function 200 that produces the segment at x 103 when the objective function is optimized.
We can use our method to partition the data into multiple segments. This can be done iteratively, e.g., by removing the identified segment from the data after each segmentation step and repeating the segmentation on the remaining part of the data to obtain a non-overlapping set of partitions, or by changing and updating the prior to obtain multiple, possibly overlapping segments.
For a temporal data, for instance a given video sequence, we can apply our segmentation method to track target objects. In this case, we set the prior information is the object region in the previous frame. We compute the graph Laplacian from the current frame and segment the current frame. The identified segment in the current frame corresponds to the object region in the current frame. This process is repeated by using the previous region as a prior to current frame to track and segment target objects.
Objective Functions
As shown in
Our method can use different priors. In addition to the ones above, we use a set of labeled pixels selected by a user operator or by another process as the prior information for interactive image and data segmentation. Such labeled pixels can be obtained as annotations from image labeling methods, multiple users, or a feedback control process.
The method and other procedures described herein can be performed in a processor 100 connected to memory and input/output interfaces as known in the art.
Graph Laplacian
The image y is represented with the graph G. This structure imposing graph is constructed by assigning each point (pixel) in y as a vertex and connecting the vertices via weighted edges within N-connectivity, in other words, each vertex is an image pixel and each edge is the affinity value of two pixels, in a patch containing N+1 pixels, and a center pixel of the patch. Therefore, the graph G is a sparse and is almost an N regular graph, except on the boundary vertices, which have less than N neighbors. For a 2D image y of size √{square root over (n)}×√{square root over (n)}, G has n vertices. Different connectivity and weighting schemes can generate different weighted graphs. It should be understood that the graph can be constructed, for higher dimensional data.
The graph Laplacian L, a positive-semidefinite matrix, representation of the graph G, is defined as L=Dg−A, where A is the adjacency matrix of G, and Gg, is a diagonal matrix Dg(i,i)=ΣjA(i,j) as
Instead of the degree matrix, many applications use a weighted adjacency, i.e., A(i,j)=ω(i,j), where ω can be a function measuring the affinity of two vertices.
Various forms of the graph Laplacian matrix have been adopted for different applications, such as image segmentation by normalized cuts, image segmentation by random walks, data classification, and matte estimation.
Laplacian matrix can be derived from a matting equation and the matrix can use a local scaling scheme for each vertex to allow self-tuning of the vertex-to-vertex similarity according to local statistics. Instead of determining an intensity distance or a Mahalanobis distance between two pixels, the matting Laplacian determines a relaxed correlation measurement of different pixels within a local 3×3 window, which in essence corresponds to a 24-neighborhood connectivity when the matting equations are analyzed. The random walk segmentation often defines L for a 4- or 8-neighborhood, and uses an exponential function for the weights ω.
Laplacian Spectrum Constraint
To incorporate the prior information, we determine the graph Laplacian matrix L from G. In other words, the Laplacian matrix L regularizes our under-constrained optimization formulation using the structure inherent in y. This enables us to define the binary segmentation problem as a least-squares constrained optimization
The constraint Lx=0 is the Laplacian spectrum constraint. This is a generalization of the conventional approaches and does not require a specific numerical solver as the matting Laplacian.
For the Laplacian matrix L, we have the following property. The multiplicity of λ=0 as an eigenvalue of L is equal to the number of connected components in the graph G. The vector e=[1, . . . , 1]T is an eigenvector for L with eigenvalue 0:
if G1, . . . , Gc are the components of G, then L partitions into block matrices L1, . . . , Lc. Let k denote the multiplicity of 0. Each Li has an eigenvector zi with 0 eigenvalue, so k≧c. Any eigenvector v=[v1, . . . , vn]T for 0 lies in span of z1, . . . , zc. Let vi>0 be the largest entry of V. This shows that Σj=1n(vi−vj)=0, which implies that vi=vj if i and j are in the same connected component of G.
Here a ‘connected component’ represents a subgraph of G in which any two vertices are connected to each other, and are not connected to any other vertices in a remaining part of the graph. In our image segmentation context, a connected component corresponds to regions having the same label.
The above property indicates that the spectrum of L determines the number of connected components in G. This means that property ensures that the different connected subgraphs are perfectly segmented. For example, let λi be the ith smallest eigenvalue of L, λ1≦λ2≦ . . . ≦λn.
Then, we have λ1=0 because Le=0, where e is the above all-1 vector in . This can be directly derived from the definition of the Laplacian matrix. Suppose the multiplicity of 0 eigenvalue is k, that is, λ1= . . . =λk=0 and 1≦k=n. Obviously, k is the dimension of L's null-space null(L) and the k smallest eigenvectors corresponding to these 0 eigenvalues comprise a basis of this null-space. An arbitrary linear transformation of these k eigenvectors generates another basis.
We are interested in a specific basis such that each of these k orthogonal vectors has 1 for all the vertices of a component of the graph and 0 for the rest of the vertices, and the sum of these k vectors is v. This ‘ideal’ basis gives us the perfect segmentation x of the input in-dimensional data y. However, due to numerical errors and the limited connectivity of the graph G, one cannot determine k by simply examining the multiplicity of 0 eigenvalue. A better way is to search for a significant change in the magnitude of the eigenvalues starting from λ1. In practice, the numerical stability of estimating k highly depends on the noise, the data structure, and the construction of G, and thus L.
The graph Laplacian spectrum constraint enforces a given structure in the input data on the prior information as expressed by the data fidelity term ∥x−x*∥2. At the same time, the objective function, as described below, achieves accuracy in the presence of outliers. With this constraint, the optimal segment x lies in the null-space of L, that is, x is constant within each connected component of the graph G. In most cases, the objective binary segmentation results, e.g. foreground and background regions, includes several disconnected components. Because the segment x can be represented by a linear combination of the 0 eigenvectors, or the ‘ideal’ basis, the objective function is still able to differentiate the foreground components from the background components. In this way, we can explicitly avoid determining L's nullity k and its basis, while still using the structure in the input data to regularize the data fidelity term.
Objective Function
We describe alternative objective functions to enforce our Laplacian spectrum constraint Lx=0 on the fidelity term ∥x−x*∥2. Depending on the norm, several objective functions can be used.
Projection onto Null-Space
Estimating an optimal segment x for the constrained optimization in Eq. (2) can be considered as a search for a vector in the null-space of L, which has a smallest distance to the prior information x*.
Let v1, . . . , vkεRn be the k eigenvectors of L corresponding to 0 eigenvalue, and let W=Span (v1, . . . , vk) be the k-dimensional subspace of Rn spanned by these eigenvectors. W is the null-space of L, null(L)=W. Let V=[v1, . . . , vk], the optimal solution can be estimated as
x=ProjW(x*)=Qx*, (4)
where Q is the projection matrix for the subspace W, and Q=V(VTV)−1VT=VVT because vi are unit vectors.
The assumption is that the nullity k of L can be determined accurately, which is not always true. Another problem is that this approach approximates x using ProjW (x*), while a solution that is a linear combination of x* and ProjW (x*) is more favorable due to noise, limited connectivity of graph G, and computational load.
Convex Function with l2 Norm on Constraint
Instead of solving a constrained optimization in Eq. (2), we can transform it into an unconstrained minimization
with a penalty term β that enforces the structure in y.
Setting the derivative of the objective function Eq. (5) to 0, we obtain a closed form solution:
x=(βLTL+I)−1x*, (6)
where I is an identity matrix. Let P=(βLTL+I)−1, which can be viewed as a modified projection matrix.
We draw the connection between P and the previous Q. Because L is a real symmetric matrix, we can diagonalize it as L=VΛVT, where V is an orthogonal matrix V=[v1, . . . , vn], and Λ is a diagonal matrix constructed from the eigenvalues of L as Λ=diag(λ1, . . . , λn). Therefore, P can be rewritten as
Eq. (7) indicates that the solution of the penalized least squares in Eq. (5) is the weighted sum of the projection of x* each subspace viviT. Also, P adds influence of non-zero eigenvectors into the final estimate based on their corresponding eigenvalues and penalty term β. If β→∞, P=Q, this Eq. (5) becomes the constrained least square in Eq. (2).
Convex Function with l1 Norm on Constraint
Instead of enforcing the Laplacian spectrum constraint in the l2 norm, we can use the ll norm to decrease the influence of the large outliers in the noisy prior. In this case, β is not required to approach to ∞ in order to solve the original constrained minimization.
The objective function can be rewritten as
(8) and solved using an Augmented Lagrangian method that replaces a constrained optimization problem by a series of unconstrained problems and by adding an additional term to unconstrained objective to mimic a Lagrange multiplier. Here, p is a scalar parameter that controls the contribution of the fidelity term with respect to the graph Laplacian spectrum constraint term.
Specifically, we use alternating direction methods in the following iterative framework
where a is an auxiliary vector, β is a penalty term, and LA(x,a,cl) is the augmented Lagrangian function of Eq. (8) defined as
where c is the Lagrangian parameter vector that has the same length as a and x. For each suboptimization, we can solve them directly by
where ∘ and sgn represent the point-wise product and the signum function, respectively.
The general total variation problem is solved by designing an auxiliary variable, which transfers the total variation ∇x out of the regularization term and the original problem into a constrained optimization. Then, the augmented Lagrangian method is applied to solve this transformed constrained optimization. In our case, we could solve Eq. (2) directly using the augmented Lagrangian method without enforcing the constraint in the l1 form. The results are very similar to the projection of x* onto the null(L).
The l1 norm is effective when the error is in the form of the impulsive noise. However, the regularization error is often continuous and has large values in arbitrary regions of the image.
Sparse Decomposition
As known in the art of compressed sensing (CS), also known as sparse sampling, sparsity refers to data or signals that are mostly null, and only has a very small number of non-zero values. We use this sparsity concept to determine solutions for our underdetermined linear systems.
Another approach to apply the Laplacian spectrum constraint is to analyze its error map, i.e., xerr=Lx. An optimal solution of Eq. (2) has the property that most items of xerr have 0 value and only a few have large errors, which means we can rewrite Eq. (2) in terms of error sparsity as
where α=Lx and a decomposition dictionary D is defined as D=L+, because x=L+α and L+ is a pseudoinverse of L. In this case, we have to determine the explicit inverse of the Laplacian matrix LTL, which is numerically inaccurate and computationally impractical because L is a large sparse matrix. Another problem of this approach is that L+ is no longer a sparse matrix, which requires an extremely large memory space for processing and storage.
Instead of determining L+ as the decomposition dictionary D, we can construct it directly from the Laplacian spectrum constraint. In the ideal case, the optimal x can be represented by a linear combination of L's 0 eigenvectors, that is x=Σi=1kαivi, where vi is the eigenvector corresponding to the ith smallest eigenvalue vi of L.
This property can be easily extended to x=Dα in matrix form, where
D=[v
1
, . . . , v
k′
],k′?k. (14)
As long as is much larger than k, we have a sparse vector a, which can be efficiently solved.
The equation x=Dα indicates that the final estimate x is actually an approximated projection of x* on the null-space of L because we limit the number of nonzero values in α and vm (m>k) may also contribute to the final estimate x. Compared with the approach, which directly projects x* onto L's null space, this approach is more accurate and can solve Eq. (2) without explicitly determining the nullity k of L.
Robust Function
Because the residual δ=|x−x*| has many spatially continuous large outliers and the least square data fidelity team weights each sample with a quadratic norm, the final estimation of Eq. (5) can be distorted severely. Depending on its quality, the prior information x* can contain incomplete and inaccurate indicators, for instance strong responses across the segment boundaries. This can cause mislabeling.
A better option is to weight large outliers less and use the structure information from the Laplacian spectrum constraint to recover x. Therefore, we adapt principles from robust statistics. As known in the art, robust statistics are typically applied to data drawn from a wide range of probability distributions, and especially for distributions that are not normally distributed. As an advantage, robust statistics are not unduly affected by outliers, see e.g., U.S. Pat. Nos. 8,340,4168, 8,194,097, and 8,078,002.
We use a robust function to replace the least square cost as
where ρ is the robust function, e.g., a Huber function, a Cauchy function, l1, or other M-estimators. We prefer the Huber function because it is parabolic in the vicinity of 0, and increases linearly when δ is large. Thus, the effects of large outliers can be eliminated significantly. We define the weight function w=[w1, . . . , wn]T at each pixel associated with the Huber function as
When written in matrix form, we use a diagonal weighting matrix W=diag(w1, . . . , wn) to represent the Huber weight function. Therefore, the data fidelity term can be simplified as ρ(x−x*)=∥W(x−x*)∥2. As a result, Eq. (15) can be solved efficiently in an iterative least square approach. At each iteration, the x is updated as
x=(βLTL+W)−1Wx*. (17)
If we set W=I, then Eq. (17) is the same as Eq. (6). The pseudocode for the robust function (15) is shown in
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.