Imaging segmentation is an image processing technique used in a wide variety of industries, including medical image analysis, satellite imagery, visual surveillance, and face recognition systems. Image segmentation partitions a digital image into multiple regions based on a homogeneity metric. Each region corresponds to a set of pixels of the digital image, where desirably all the pixels of the digital image are encompassed by the multiple regions as a whole. This low-level abstraction of the image permits high-level semantic operations to be performed with a reduced and relevant set of data.
Existing techniques for color image segmentation include feature-based, edge-based, region-based, and hybrid segmentation approaches. The latter segmentation approach may employ two or more of the feature-based, edge-based, and region-based segmentation approaches. Each of these approaches to segmenting an image has disadvantages, however. Some of the segmentation approaches yield less than optimal image segmentation. Other of the segmentation approaches yield better image segmentation, but at the expense of increased processing time.
The method 100 receives an image (102). In one embodiment, data can be received that corresponds to the image. The image has a number of pixels, and a number of color channels, such as the red, green, and blue color channels, as can be appreciated by those of ordinary skill within the art. Each pixel thus has red, green, and blue color values corresponding to the red, green, and blue color channels of the image. For example, an eight bit image may have red, green, and blue values for each pixel that are between zero and 28−1, or 255.
The method 100 then initially segments the image into a number of initial regions (104). These initial regions are typically relatively small in size and relatively large in number, and are later merged to yield what are referred to as the merged regions to which the final segmentation of the image corresponds. In one embodiment, data can be generated as corresponding to these initial regions. The image is initially segmented into the initial regions at least by dynamically selecting seeds within the image using a dynamic color gradient threshold, and growing the initial regions from these seeds until the initial regions as a whole encompass all the pixels of the image. Such initial segmentation of the image into initial regions is now described in more detail.
An edge map of the image is generated (114). The edge map is used to define the initial regions that are utilized as a starting point for the remainder of the segmentation of the image. The edge map of the image particularly defines the edges of different features within the image, such as different objects within the image.
In one embodiment, the edge map is generated as follows. It is presumed that the image is a function f(x, y). Therefore, the edges of the image can be defined as the first derivative
The magnitude of the gradient is select to ensure rotational invariance. For a vector field f the gradient vector can be defined as:
In equation (1), Djfk is the first partial derivative of the kth component of f with respect to the jth component of x. The distance map from the point x with a unit vector u in the spatial domain d=√{square root over (uTDT du)} will be the corresponding distance traveled in the color domain. The vector that maximizes the given distance is the eigenvector of the matrix DTD that corresponds to its largest eigenvalue.
In the special case of an image having red, green, and blue color channels, which is a color image that can be referred to as an RGB image, the gradient can be determined in the following manner. First, u, v, w denote each color channel and x, y denote the spatial coordinates for a pixel of the image. The following variables are further defined to simplify the expression of the final solution:
Therefore, the matrix DTD becomes
And its largest eigenvalue λ is
By calculating λ, the largest differentiation of colors is obtained and the edges of the image can be defined as
G=√{square root over (λ)} (7)
Thus, the magnitude of the gradient G(i, j) is used to obtain the edge map of the image.
A dynamic color gradient threshold is selected so that the initial regions within the image are able to be selected such that no initial region encompasses any of the edges of the image (116). The dynamic color gradient threshold corresponds to a discrete gray level of the image when the image is considered in grayscale. That is, and more specifically, the dynamic color gradient threshold is applied to the edge map of the image; it is initially set low to account for areas of the image that include no edges. For example, for a given pixel of the image, there are red, green, and blue color values. However, this pixel also has a grayscale component based on the red, green, and blue color values when the color information of these color values is removed, as can be appreciated by those of ordinary skill within the art. The dynamic color gradient threshold is thus a color gradient threshold in that the grayscale components of the pixels of the image that are compared against this threshold are generated from—i.e., are based on—the color values of the pixels. The color gradient threshold is further dynamic in that as the method of
The initial regions are identified by clustering pixels of the image that fall below the dynamic color gradient threshold selected, such that no initial region includes or encompasses any of the edges of the image as defined by the edge map. In one embodiment, the dynamic color gradient threshold is selected in part 116 so that it is the smallest, or lowest, such threshold that permits selection of the initial regions within the image that do not encompass any edges of the image, as defined by the edge map. Once this dynamic color gradient threshold is selected, then, the initial regions are selected, where each initial region includes an initial seed (118).
Because the dynamic color threshold has been selected in part 116 so that no initial region encompasses any of the edges of the images, the initial regions are selected in part 118 so that they do not include any edges of the image. The selection of the initial regions in part 118 is thus constrained by the edge map generated in part 114. The method of
Each initial region is said to include an initial seed, where a seed is also defined as a cluster of one or more pixels of the image. Each initial seed is defined as one of the initial regions, and vice-versa. In one embodiment, to prevent multiple seed generation within homogeneous and connect regions, which should form a single initial region, the initial seeds (i.e., the initial regions prior to their being grown) are selected in part 118 as clusters of pixels that are larger than 0.5%, or another predetermined percentage, of the image.
Each such individual cluster of pixels is assigned a particular label for differentiation purposes, and this resulting label map is referred to as the parent seeds, or PS, map. Thus, the initial seeds correspond to these individual clusters of pixels. In one embodiment, as can be appreciated by those of ordinary skill within the art, the labeling process may be performed by run-length encoding the image, and then scanning the runs to assign preliminary labels and to record label equivalences in a local equivalence table. Thereafter, the equivalence classes are resolved, and the runs relabeled based on the resolved equivalence classes.
Prior to growing the initial regions corresponding to the initial seeds, the method of
Next, the initial regions that have been identified are grown to include more pixels of the image. Specifically, the dynamic color gradient threshold, originally selected in part 116, is increased by one gray level (122), such as from 15 to 16, from 16 to 17, and so on. Areas of the image that are adjacent to the seeds are located (124). That is, for each seed that has been assigned to an initial region, an area of the image that is adjacent to the image is located. Each area is then merged to the initial region to which the seed adjacent to the area in question has been assigned (126). In this way, the initial regions are “grown” to encompass other portions of the image that are not initially part of these seeds. In particular, it is noted that such region growth does not depend exclusively on the initial assignment of clusters (i.e., the initial seeds) for the final segmentation of the image. The “seeds referred to in part 126 are existing seeds, and at first include just the initial seeds that have been determined in part 118, but subsequently include additional (new) seeds that are generated when part 134 is performed, as is described in more detail later in the detailed description.
In one embodiment, determining the areas of the image that are adjacent to the seeds and merging these areas to the initial regions to which the seeds have been assigned can be achieved as follows. Child seeds are selected that fall below the dynamic color gradient threshold, which was previously advanced to the next discrete gray level in part 122. These child seeds are classified into adjacent to existing seeds and non-adjacent to existing seeds. It can thus be important to know the existing seed to which each such child seed is adjacent. The object in this sense is to be able to process all the adjacent child seeds in a vectorized approach.
To achieve this task, the outside edges of the PS map that has previously been generated are detected, using a nonlinear spatial filter. The filter operates on the pixels of an n×n neighborhood, such as a 3×3 neighborhood, and the response of its operation is assigned to the center pixel of the neighborhood. The filter operates according to
In equation (8), β is the neighborhood being operated on. The result of applying this filter is a mask indicating the borders of the PS map.
The child seeds are individually labeled and the ones adjacent to the existing seeds are identified by performing an element-by-element multiplication of the parent seeds edge mask and the labeled child map. The remaining pixels are referred to as the adjacent child pixels, and the pixels of which labels are members to the set of labels remaining after the multiplication become part of the adjacent child seeds map. For the proper addition of adjacent child sets, their individual color differences may be compared to their parent seeds to assure homogeneous segmentation. Reduction of the number of seeds to be evaluated is achieved by attaching to the existing (parent) seeds the child seeds that have a size smaller than the minimum seed size, or MSS. In one embodiment, MSS may be set to 0.01% of the image.
The child seed sizes are determined utilizing sparse matrix storage techniques, as can be appreciated by those of ordinary skill within the art, to provide for the creation of large matrices with low memory costs. Sparse matrices store just their nonzero elements, together with the location of these nonzero elements, which are referred to as indices. The size of each child seed is determined by creating a matrix of M×N columns by C rows, where M is the number of columns of pixels within the image itself, N is the number of rows of pixels within the image, and C is the number of adjacent child seeds. The matrix is created by allocating a one at each column in the row that matches the pixel label. Pixels that do not have labels are ignored. By summing all the elements along each row, the number of pixels per child seed is obtained.
To attach regions together, an association between child seeds and their existing parent seeds may be needed. The adjacent child pixels provide the child labels, but not the parent labels. Another spatial filter is applied to the PS map to obtain the parent labels. The filter response at each center point is equal to the maximum pixel value in its neighborhood. The association between a child seed and its existing parent seed can then be obtained by creating a matrix with the first column composed of the adjacent child pixels, and the second column with the labels found at the location of the adjacent child pixels in the matrix obtained after applying the maximum value filter to the PS map. It is noted that the use of non-linear filters can provide information about the seeds without directly manipulating the image, such that the final image segmentation is not affected.
The functionality of the association matrix is manifold. It provides the number of child pixels that are attached to existing parent seeds, and also identifies which child seeds share edges with more than one existing parent seed. Child seeds smaller than MSS can now be directly attached to their existing parent seeds. Child seeds that share less than a predetermined number of pixels with their parent seeds, such as five pixels, and that are larger than MSS are returned to the un-segmented region to be processed when the region shares a more significant border. The remaining child seeds are compared to their parent seeds to analyze if they should be added or not.
Given that regions in images vary spatially in a gradual manner, just the nearby area of adjacency between a parent seed and a child seed is compared to provide a true representation of the color difference. This objective can be achieved by using two masks that exclude the areas of both parent seeds and child seeds that are distant from their common boundaries. The first mask is a dilation of the PS map using an octagonal structuring element with a distance of a predetermined number of pixels, such as 15 pixels, between the center pixel to the sides of the octagon, as measured along the horizontal and vertical axes. The second mask is the same dilation, but applied to the adjacent child seeds map. The two masks exclude mutually the pixels that fall beyond each other's dilation masks. In one embodiment, such masks where the distance is set to 15 pixels has been found to perform well for images that are 300-by-300 pixels in size to images that are 1,000 pixels-by-11000 pixels in size.
The comparison of regions can be performed using the Euclidean distance between the mean colors of the clusters, or areas, being compared. In one embodiment, prior to this comparison being performed, the image may be converted to the CIE L*a*b color space, as known within the art, to assure that comparing colors using the Euclidean distance is similar to the differentiation of colors by the human visual system. The maximum color distance to allow the integration of the child seed to the parent seed in one embodiment is set to 20. This distance is selected more generally to allow the differentiation of at least a number of different colors, such as ten different colors, along the range of the a* channel or the b* channel.
At some point, the initial regions are sufficiently grown to encompass all the pixels of the image (128), at which time the method of
If all the initial regions still do not encompass all the pixels of the image however (128), and all the existing seeds have been processed such that the dynamic color gradient threshold is not exceeded for any existing pixel (132), then the method of
The new areas that are detected in part 134 are selected so that they fall below the value of the new seed threshold. All such regions that are not attached to any existing seeds and are larger than MSS are added to the PS map. Furthermore, new seeds that share borders with existing seeds may still be added provided that they represent areas large enough to become initial regions by themselves, and that the color differences between such areas and their neighbors is greater than the maximum color difference allowed for defining a single region.
It is noted that region growth without feedback of the growth rate of each current may cause existing seeds to overflow into regions of similar colors, but different textures. Each region in an image may display similar density throughout the region. Therefore, to maintain homogeneity, the regions that are created at low gradient levels after the growth rate has stabilized are classified as a grown seeds and removed from the growth process. As such, size tracking of each seed may be performed each time new (i.e., dynamic) seeds are added. The number of pixels per seed is thus determined at each such interval, and when the increment of a given existing seed does not reach about a predetermined percentage of its original size, such as 5%, the growth of the seed is stopped. When the last interval has been reached, all the identifiable regions have been provided a label, and all remaining areas are edges of the segmented regions. At this stage, then, all the seeds may nevertheless be allowed to grow to complete the region growth process.
Next, the method of
Referring back to
The colors of an image are quantized into a number of quantized colors (142). For example, an eight bit image has 28, or 256, colors for each of its red, green, and blue color channels. Each color channel can have its values represented as a number of quantized ranges: {0 . . . a}, {a+1 . . . b}, . . . {m+1 . . . n}. For an eight bit image, the 256 colors of each color channel can be represented as number of quantized ranges: {0 . . . 51}, {52 . . . 102}, {103 . . . 153}, {154 . . . 204}, {205 . . . 255}. As such, the values of a given color channel that are in the first range are assigned to the first quantized color, the values between 52 and 102 are quantized to the second quantized color, the values of a given color channel between 103 and 153 are quantized to the third quantized color, and so on. Furthermore, because each pixel of the image has three color values (r, g, b) corresponding to the red, green, and blue color channels of the image, each pixel of an eight bit image for which there are five quantized ranges for each color channel can be quantized into one of 53=125 different quantized colors.
Referring back to
Entropy is defined as a quantity in information theory, as can be appreciated by those of ordinary skill within the art. Therefore, a random group of pixels s can be selected from an image, with a set of possible values {a1, a2, . . . aj}. The probability for a specific value aj to occur is P(aj), and it contains
units of information. The quantity l(aj) is referred to as the self-information of aj. If k values are presented within the set, the law of large numbers stipulates that the average of a random value aj is likely to be closed to the average of the whole set. Thus, the average self-information obtained from k inputs is
−kP(a1) log P(a1)− . . . −kP(aj)log P(aj) (10)
Furthermore, the average information per sample, or entropy of the set, is defined by
This quantity is defined for a single random variable. However, in relation to the image that is the subject of the method 100, multiple variables are being worked with. Therefore, to take advantage of the color information without extending the process to determine the joint entropy, this is why the colors in the image have been quantized. This quantization of colors can be achieved in part 142 by dividing the RGB color cube of
Referring back to
To describe how one-way multivariate variance analysis can be performed, the more basic one-way variance analysis is described, and those of ordinary skill within the art can appreciate that one-way multivariate variance analysis just extends such one-way variance analysis, as is described herein as well. The general case in which p variables x1, x2, . . . xp are measured on each individual group is considered, in any direction in the p-dimensional sample of the groups that is specifies by the p-tuple (a1, a2, . . . ap). Each multivariate observation x′j=(xj1, xj2, . . . , xip) can be converted into a univariate observation yi=a′xi where a′=(a1, a2, . . . ap). Because the samples are divided into g separate groups, it is useful to relabel each element using the notation yij, where i refers to the group that the element belongs to, and j is the location of the element on the ith group.
The objective of one-way variance is to locate the optimal coefficients of the vector a that will yield the largest differences across groups and minimize the distances of elements within the group. To achieve this, the between-groups sum-of-squares and products matrix B0 and the within-groups sum-of-squares and products matrix W0 are defined by
In equations (12) and (13) the labeling xy is analogous to that of
is the sample mean vector in the ith group and
is the overall sample mean vector. Since yij=a′xij, it can be verified that the sum of between-groups and within-groups becomes
SSB(a)=a′B0a and SSW(a)=a′W0a (14)
With n sample members and g groups, there are (g−1) and (n−g) degrees of freedom between and within groups respectively. A test of the null hypothesis that there are no differences in mean value among the g groups is obtained from the mean square ratio
In equation (15), B is the between-group covariance matrix and W is the within-groups covariance matrix. Maximizing F with respect to a is done by differentiating F and setting the result to zero, yielding
However, at the maximum of
has to be a constant I, so the required value of a has to satisfy
(B−/W)a=0 (16)
Equation (16) can be rewritten as (W−1B−/I)a=0, so / has to be an eigenvalue, and a has to be the eigenvector corresponding to the largest eigenvalue of W−1B. This result provides the direction in the p-dimensional data space, which tends to keep the distances between each class small, and simultaneously maintains the distances between classes as large as possible.
In the case where g is large, or if the original dimensionality is large, a single direction provides a gross over-simplification of the true multivariate configuration. The term in equation (16) generally possesses more than one eigenvalue/eigenvector pair that can be used to generate multiple differentiating directions. Suppose that λ1λ2> . . . λs>0 are the eigenvalues associated with the eigenvectors a1, a2, . . . , as. If new variates y1, y2, . . . by yi=a′ix are defined, then the yi are termed canonical variates.
Thus, all the eigenvalues λt and eigenvectors ai are gathered together so that ai is the ith column of a (p×s) matrix A, while λt is the ith diagonal element of the (s×s) diagonal matrix L. Then, in x terms equation (16) may be written as BA=WAL, and the collection of canonical variates is given by y=A′x. The space of all vectors y is termed the canonical variate space. In this space, the mean of the ith group of individuals is
Now, the Mahalanobis squared distance between the ith and jth group is given by
D
2=(
Comparing equation (13) to the Euclidean distance of the group means in the canonical variate spaces, and substituting for
However, it can be proven that AA′≡W−1. Thus, substituting for AA′ above yields equation (17). As such, by constructing the canonical variate space in the way described, the Euclidean distance between the group means is equivalent to the Mahalanobis distance of the original space. Obtaining the Mahalanobis distance between groups is beneficial, because it accounts for the covariance between variables as well as of differential variances and is a good measure of distance between two multivariate populations.
It is noted that the segmentation performed in the method 100 up to this point has been performed with an absence of information regarding the individual initial regions. Now that the image has been segmented into the different initial regions, information can be gathered from each individual initial region. There are four sources of information, the red, green, and blue color channels, and the texture channel. There are also individual initial regions having different numbers of pixels. This data can be modeled using an (N*P) matrix, where N is the total number of pixels within the image, and P is the total number of variables that contain information about each pixel. Thus, where G is the total number of initial regions into which the image has been already segmented, then the matrix is composed of G separate sets. As such, a mean value for each individual region is obtained and the result used to compare the different individual regions. This is achieved by performing one-way multivariate analysis of variance of the color channels and the texture channel of the image within each initial region in part 146, as such one-way multivariate analysis has been described above.
From this one-way analysis, the Mahalanobis squared distance values for the pairs of initial regions are obtained. Still referring to
In general, the individual regions of the pair of initial regions having the smallest, or lowest, distance value are merged together. One-way multivariate analysis of variance is then performed in relation to this new merged region and the other initial regions that were not merged together. The individual regions of the pair of regions having the smallest distance value are again merged together, and one-way multivariate analysis of variance is again performed in relation to the new merged region and the other regions that were not merged together. This iterative process continues until the number of regions is no longer greater than a predetermined number of regions into which the image is desired to be segmented. In one embodiment, for instance, the number of regions into which an image is desired to be segmented, may be selected by the user.
However, this iterative process can be computationally taxing, because one-way multivariate variance analyses are performed each time two regions are merged together. This is because once a region has been merged with another region, the similarity of this newly merged region to the other regions is unknown, but needed if the newly merged region is to be merged with other regions later. Therefore, in one embodiment, an alternative approach is employed to prevent the Mahalanobis distance values from having to be reevaluated after each region merging has occurred, as is now described in detail.
First, the initial regions into which the image has already initially been segmented are referred to as working regions (152), for descriptive convenience and clarity. Thereafter, one-way multivariate analysis of the variance of the color channels and the texture channel of the image is performed for or within each working region (154), as has been described in relation to part 146. The result is that there is a Mahalanobis square distance value, or another type of distance value, between or for each pair of working regions.
Thereafter, a predetermined number of pairs of working regions is selected (156), is referred to as a current set of pairs of working regions for descriptive clarity and convenience. This predetermined number of pairs of working regions has the smallest or lowest distance values of all the pairs of working regions. In one embodiment, the predetermined number may be five, which has been found to be an adequate number of pairs of working regions to reduce computational time, while still adequately if not optimally merging working regions together. That is, as will be described and as will become apparent later in the detailed description of
The pairs of working regions within the current set are ordered from the pair of working regions within the current set that encompasses a smallest number of pixels (i.e., the first pair) to the pair of working regions within the current set that encompasses a largest number of pixels (i.e., the last pair) (158). The working regions of the first pair of the current set are merged together (159), to yield a new working region that replaces both the working regions of this first pair. The next pair of working regions within the current set is then selected (160), and referred to as the current pair of working regions of the current set, for descriptive convenience and clarity.
If either working region of the current pair of working regions is encompassed by a new working region previously generated by merging, then the other working region of this current pair is merged into this new working region if the other working region is already not part of this new working region(161). For example, the current pair of working regions may include working regions a and b. There may be one new working region c that was previously generated. Therefore, in part 161, if a is part of c already, then the working region b is added to the new working region c if the working region b is not already part of c. However, if neither working region of the current pair of working regions is encompassed by a new working region previously generated by merging, then these two working regions are merged together (162), to yield another new working region that replaces both the working regions of this current pair.
If the current pair is not the last pair of the current set of pairs of working regions (163), then the method of
Therefore, in the method of
Therefore, referring back to
The method 100 can conclude in one embodiment by outputting the merged regions that have been generated as the segmentation of the image in question (110). For example, in one embodiment, this segmentation may be stored on a computer-readable medium, for subsequent processing by one or more other computer programs. As another embodiment, the segmentation may be displayed on a display device for a user to view the merged regions. As a third example, the merged regions may be transmitted over a network to a computing device other than that which performed the method 100, so that this other computer device can perform further processing on the image in question as segmented into the merged regions. These and other examples are all encompassed by the phrase that the merged regions into which the image has been segmented are output.
It is finally noted that the image segmentation approach that has been described herein has been found to provide satisfactory, good, and/or optimal results for a wide variety of different color images. Furthermore, and just as advantageous, is that the image segmentation approach is performed relatively quickly even with modest computing power. As such, embodiments of the present disclosure are advantageous because they provide both good results and fast processing time to generate these results.