The present disclosure relates to merging clusters of related objects (such as images), and more specifically to application of cluster merging criteria.
In the field of data analysis and retrieval, it is common to perform clustering to help describe features of multiple objects in a generalized manner. In particular, objects that are similar are grouped in clusters so that objects may be represented in a more compact way. For example, in the context of images, a cluster may be referred to as a “visual word” because it represents a general visual concept. A series of visual words may be used to construct a “visual vocabulary” for describing or comparing images.
In one example, clustering is performed by “K-means” clustering. K-means clustering aims to partition n objects into k clusters based on respective data points corresponding to one or more objects. Specifically, in K-means clustering for images, each data point corresponding to an image feature is assigned to a cluster with the nearest centroid (arithmetic mean of all points in the cluster). When all points have been assigned, the positions of the centroids are recalculated. The assigning of points and recalculation of centroids are iterated until the centroids no longer move.
Nevertheless, K-means clustering and other conventional clustering methods often result in cluster sets which are impractical or undesirable. For example, with images, the number of clusters generated by conventional means may not be suitable for a visual vocabulary. If too few clusters are generated, the visual vocabulary is not descriptive enough. If too many clusters are generated, the visual words are very small and cover an overly specific set of visual features. Similar shortcomings may occur when describing other types of objects (e.g., audio files).
The foregoing situation is addressed by determining whether to merge clusters of objects based on both a cluster compactness measure and a cluster quality measure.
Thus, in an example embodiment described herein, a determination is made as to whether to merge clusters of objects. Semantic information is input for at least one of the objects. A compactness of a candidate cluster to be formed when a first cluster and a second cluster are merged is evaluated. A cluster quality of the candidate cluster is evaluated, based on the semantic information. The first cluster and the second cluster are merged in a case that the compactness of the candidate cluster relative to a compactness of the first and second clusters exceeds a compactness threshold, and the cluster quality of the candidate cluster relative to a cluster quality of the first and second clusters exceeds a cluster quality threshold.
By determining whether to merge clusters of objects based on both a cluster compactness measure and a cluster quality measure, it is ordinarily possible to create a vocabulary with an appropriate number of clusters. For example, when clustering images, it is ordinarily possible to create a visual vocabulary which generalizes when necessary (e.g. when there is insufficient data to be more specific or too much noise or variation to be more specific), but also has a sufficient number of visual words to describe different visual features.
In one example aspect, the compactness threshold is based on a number of objects in the first cluster, the number of objects overall, and the number of dimensions of the objects.
In other example aspects, the semantic information describes one or more semantic labels of the image. In another example aspect, at least two or more semantic informations of one or more objects in the first cluster and the second cluster are related.
In still another example aspect, cluster compactness is evaluated based at least on an average standard deviation in all dimensions of one or more object features in a cluster.
In yet another example aspect, cluster compactness is evaluated based at least on a standard deviation in a direction of a line connecting the center of the first cluster and the center of the second cluster in a vector space defined by the first cluster and the second cluster.
In other example aspects, the cluster quality is based on a Rand Index, a Relational Rand Index, or a Mutual Information measure, and the cluster quality threshold is based on an expected Rand Index, an expected Relational Rand Index, or an expected Mutual Information measure. Some of these concepts are known, whereas others are defined herein.
In still another example aspect, an existing cluster of objects is split into a plurality of clusters. Semantic information is input of at least one of the objects in the existing cluster. A respective compactness is evaluated of each of a first candidate cluster and a second candidate cluster to be formed when the existing cluster is split. A respective cluster quality of each of the first candidate cluster and the second candidate cluster is evaluated based on the semantic information. The existing cluster is split in a case that the respective compactness of the first candidate cluster and the second candidate cluster relative to the compactness of the existing cluster each exceed a compactness threshold, and the respective cluster quality of the first candidate cluster and the second candidate cluster relative to a cluster quality of the existing cluster exceed a cluster quality threshold.
This brief summary has been provided so that the nature of this disclosure may be understood quickly. A more complete understanding can be obtained by reference to the following detailed description and to the attached drawings.
Host computer 41 also includes computer-readable memory media such as computer hard disk 45 and DVD disk drive 44, which are constructed to store computer-readable information such as computer-executable process steps. DVD disk drive 44 provides a means whereby host computer 41 can access information, such as image data, computer-executable process steps, application programs, etc. stored on removable memory media. Other devices for accessing information stored on removable or remote media may also be provided.
Host computer 41 may acquire digital image data from other sources such as a digital video camera, a local area network or the Internet via a network interface. Likewise, host computer 41 may interface with other color output devices, such as color output devices accessible over a network interface.
Display screen 42 displays a clustering of data. In that regard, while the below processes will generally be described with respect to images for purposes of conciseness, it should be understood that other embodiments could also operate on other objects. For example, other embodiments could be directed to clustering omic data, audio files or moving image files. In that regard, as described herein, “object” refers to the data being clustered (e.g., images, audio files, omic files, or moving image files). At least one of the objects is described using semantic information. Meanwhile, “feature” refers to features of the objects, which can be examined to determine whether to merge the clusters of objects, as described below. “Object feature” may also be used herein to describe the features of the objects.
In addition, while
RAM 115 interfaces with computer bus 114 so as to provide information stored in RAM 115 to CPU 110 during execution of the instructions in software programs such as an operating system, application programs, cluster merging modules, and device drivers. More specifically, CPU 110 first loads computer-executable process steps from fixed disk 45, or another storage device into a region of RAM 115. CPU 110 can then execute the stored process steps from RAM 115 in order to execute the loaded computer-executable process steps. Data such as color images or other information can be stored in RAM 115, so that the data can be accessed by CPU 110 during the execution of computer-executable software programs, to the extent that such software programs have a need to access and/or modify the data.
As also shown in
Cluster merging module 123 comprises computer-executable process steps, and generally comprises an input module, a compactness evaluation module, a quality evaluation module, and a merging module. Cluster merging module 123 inputs clusters of images (or other data), and outputs a determination of whether or not to merge the image clusters, along with, in some cases, the merged clusters. More specifically, cluster merging module 123 comprises computer-executable process steps executed by a computer for causing the computer to perform a method for determining whether to merge clusters of objects, as described more fully below.
The computer-executable process steps for cluster merging module 123 may be configured as a part of operating system 118, as part of an output device driver such as a printer driver, or as a stand-alone application program such as an image management system. They may also be configured as a plug-in or dynamic link library (DLL) to the operating system, device driver or application program. For example, cluster merging module 123 according to example embodiments may be incorporated in an input/output device such as a camera with a display, in a mobile output device (with or without an input camera) such as a cell-phone or music player, or provided in a stand-alone image management application for use on a general purpose computer. It can be appreciated that the present disclosure is not limited to these embodiments and that the disclosed cluster merging module 123 may be used in other environments in which image clustering is used.
In particular,
As shown in
Briefly, in
In more detail, in step 401, images or other data to be clustered are input. The images may, for example, be previously stored (e.g., as image files 125 on fixed disk 45), or may be acquired from another device over a network or local connection. Numerous other methods for inputting images or other data may be used, but for purposes of conciseness will not be described here in detail.
In step 402, the input images are clustered. The clustering may be performed according to known methods, such as K-means clustering of features derived from the images.
In step 403, semantic information is input for at least one of the images. In particular, at least one of the images will have a semantic label or “ground truth” by which the image has been previously categorized. For example, an image object including a set of features which in some manner depict or describe a dog may be labeled “dog”, and this semantic label is input for use in determining whether to merge any clusters, as described below with respect to step 405. Thus, in some embodiments, the semantic information describes one or more semantic labels of an image.
In step 404, there is an evaluation of the compactness of a candidate cluster to be formed when a first and second cluster are merged. Put another way, there is a determination of whether the candidate merged cluster extent will be “small enough”. In the following analysis, the statistics of the sub-clusters (i.e., the first and second clusters) and the entire cluster (i.e., the candidate merged cluster) are examined.
Examples of evaluating compactness of a candidate cluster will now be described with respect to
Suppose a cluster is formed from a multivariate (d dimensional) Gaussian distribution with a diagonal covariance matrix with all diagonal elements equal to σ2. A hyper-plane can be considered which partitions samples generated from this distribution into two sub-clusters. A hyper-plane is a plane in multiple dimensions, e.g., a dividing line or plane between data points in multiple dimensions, and has all the characteristics of a plane. Since the multivariate Gaussian distribution is symmetric, it can be considered that all separating hyper-planes that are a fixed distance (as measured by a normal line to the plane) from the cluster mean are equivalent. Put another way, splits of symmetric multivariate Gaussians by a hyper-plane can be considered as 1-dimensional splits in the direction of a line from the mean and orthogonal to the splitting hyper-plane, whereas the remaining dimensions orthogonal to this dimension are left unchanged. In other words, this is equivalent to considering the split of the multivariate Gaussian distribution in just one dimension, while the other dimensions are left as is.
In that regard,
In one embodiment, cluster compactness is evaluated based at least on a standard deviation in a direction of a line connecting the center of the first cluster and the center of the second cluster in a vector space defined by the first cluster and the second cluster.
If the Gaussian distribution in
In the above equation,
σ is the standard deviation of the original Gaussian distribution before the split, and a is the dividing point/line/hyper-plane. Meanwhile, the variance of the right region (how “spread out” the feature points are) is given by
If the total distribution generates N samples, then the number of samples expected to be in R to be
so as an estimate of the normalized partition value
The above analysis can be repeated for the variance of L:
When the two clusters L and R are merged, the extent of the clusters is only increased in the direction orthogonal to the separating hyper-plane. If the L and R clusters are drawn from a single multivariate Gaussian distribution, the new standard deviation in the merged direction is simply σ. If the L and R clusters are generated from separated distinct Gaussians, then the extent of the merger in that direction is closer to σL+σR, and can even be greater than this when the clusters are far apart.
Thus, in the case when the data is generated from a single Gaussian distribution, adding the width of R and L gives
If the added width of L and R is larger than the width of the merged cluster (to be determined as shown below), then L and R should be merged.
In that regard, the merged deviation, σ, increases if the L and R clusters are actually separate. Therefore, a cluster merger based on compactness is tested as one that satisfies the single Gaussian model:
However, in in a d-dimensional space, it is often not convenient to measure the deviation in the direction orthogonal to the separating hyper-plane. Therefore, instead, the average deviation in all dimensions of the L and R clusters is measured, as {circumflex over (σ)}L and {circumflex over (σ)}R respectively. The mean deviations are assumed to be {circumflex over (σ)} in d−1 dimensions and {circumflex over (σ)}L or {circumflex over (σ)}R in one dimension, where {circumflex over (σ)}R is the average deviation in all dimensions of the merged cluster. Thus:
Accordingly, the cluster compactness is evaluated based at least on an average standard deviation in all dimensions of one or more object features in a cluster. The above leads to the compactness merge threshold, which is:
Thus, the right side of the above equation is the threshold which can be used to determine whether to merge the clusters L and R. The compactness threshold is based on a number of objects in the first cluster, the number of objects overall, and the number of dimensions of the object features. In the compactness merge threshold above, noting that
Plotting
yields the compactness threshold curve shown in
From
Examples of input data clusters and decisions whether or not to merge the clusters based on cluster compactness, using the threshold defined above, will now be described with respect to
A first example of input data points is shown in
The samples on the left side of the plot in
The value 1.26 is less than the recommended merge threshold of about 1.61 from
as defined above. Accordingly, a merger of these clusters would not be recommended.
Referring back to
Meanwhile, a second example of input data points is shown in
The samples on the left side of
In this case, 1.57 is also less than the recommended merge threshold of about 1.61 (from
as defined above). Accordingly, a merger of these clusters would also not be recommended.
A third example is shown in
Thus, in this case, 1.67 exceeds the merger threshold of about 1.61, suggesting that a merger of the left and right clusters is acceptable.
Generally, when the data is drawn from the same distribution as in the example above, the ratio of deviations above may fall close to the threshold value. In some cases, due to random variations the ratio may not exceed the theoretical threshold. Thus, in some embodiments, the threshold may be modified to allow more or fewer mergers. In other embodiments the change in the threshold may depend on the number of samples observed, since larger sample sizes should result in less statistic estimate variance and therefore, the statistics may be trusted as being more likely to be accurate.
In the above examples described with respect to
σL+σR=σ (12)
This leads to a practical threshold of:
Accordingly, in the uniform case, the threshold is constant for a fixed number of dimensions, and does not depend on the number of elements in the L and R clusters. As shown in
Returning now to
In that regard, while the above compactness criteria is important for determining acceptable cluster mergers from an unsupervised perspective, the cluster quality criteria determines the acceptability of mergers from a supervised perspective. Both of these perspectives are important. Without the compactness criteria, clusters of similar truth composition may be merged even though they are disjoint. On the other hand, without the cluster quality criteria, clusters that are close together may be merged despite their different compositions of truth labels. Both are indicators for whether the data are drawn from the same or different distributions in both space and labels.
Thus, both the compactness measure and the cluster quality measure are used. Generally, the compactness measure is faster to compute and is therefore performed first to weed out candidates, so that the slower cluster quality measure can be performed on fewer candidates. Moreover, using both measures can allow for a more appropriate stopping point for merging, and specifically to mirror a more desired breadth of visual vocabulary as described above.
Turning now to step 405, evaluation of a cluster quality of a candidate cluster based on semantic information (e.g., a “ground truth” or “label”) will now be described.
For example, the system may be presented with a clustering of C clusters, and in step 405, evaluate whether merging two clusters together would improve the clustering quality or not. In general, having fewer clusters to describe a data set is preferable over having more clusters, but joining clusters of different classes of objects is not desirable because the cluster becomes less specific. A Rand Index or adjusted Rand Index measure could be used, for example, to test the clustering quality before and after the merger of two clusters to decide whether the merger provides a better clustering. However, it can be easier to look at the difference of the two measures as there are many common components shared between the two measures. Thus, it is useful to determine when to merge or not merge clusters based on the similarity or dissimilarity of cluster content, and which of the mergable clusters would provide the best merger choice.
A contingency table is used to summarize the clustering of labeled objects into multiple clusters. The table M is a matrix with the i-th row and j-th column element labeled nij. nij is the count of the number of objects with label i that are in cluster j.
If cluster j and cluster k were to be merged into a single cluster, the two columns of the contingency table could be combined by summing them and putting them into a new column while removing columns j and k. Letting α* be the new column vector, it can be seen that α*=αj+αk, where αj is the unmerged j-th column.
The unmerged Relational Rand Index is given as RRI0:
In the above equation, a is the row sum vector (the number of objects with each label) with elements given by, αi=Σj=1Cnij, b is the column sum vector (the number of objects in each cluster) with elements given by bj=Σi=1Rnij, and N is the total number of objects. Details of calculating the unmerged Relational Rand Index are provided in U.S. application Ser. No. 13/542,433, entitled “Systems and methods for cluster analysis with relational truth” and in PCT/US2011/056441, entitled “Systems and methods for cluster validation”, the contents of which are incorporated by reference herein.
In order to determine whether to merge, it is useful to know the RRI when clusters j and k are merged. First, the term bTb is examined. Deleting the j and k-th columns and adding the merged column yields
Next, the term ΣC=1CαCTSαC is examined under the merger:
The last step above is due to the fact that S is (typically) a symmetric matrix.
The other terms in the RRI expression are not changed under the merger. Thus, the difference in the RRI based on the merger can be evaluated as follows:
In order to evaluate the cluster quality, the merger Quality Improvement, Δjk, can be defined by removing the constant terms above:
This change can be compared to the expected value of the change to determine whether a merger improves clustering quality more than any change in quality that would occur at random. Thus, attention can now be turned to the expectation of Δjk.
The expectation of the quality improvement can be generated in multiple ways. One Adjusted Rand Index approach is to assume that the row sums (the class label distribution) are fixed while the cluster sizes are random, as described in PCT Application No. PCT/US2011/56441 (cited above). In this case the expectation is taken over random M and random b. This approach is also repeated for the Adjusted Relational Rand Index in one embodiment of the disclosure.
For example, for a fixed a and a random b, the expected RRI improvement (namely [Δjk|α, C]) can be calculated by reducing the number of clusters and use this value γ as a threshold on Δjk:
This approach has the advantage that these expectations do not depend on the clustering results, and therefore remain the same for all possible pairs of mergers from C clusters. Thus the expectation of the RRI needs to only be calculated for C and C-1. The details for calculating these expectations are given in U.S. application Ser. No. 13/542,433 and in PCT/US2011/056441 mentioned above.
An alternative embodiment uses the b values (the sizes of the clusters) in the calculation.
Details of calculations under this alternative embodiment are also described in U.S. application Ser. No. 13/542,433 and in PCT/US2011/056441 mentioned above. Thus, in various embodiments, the cluster quality threshold can be calculated using, for example, an expected Rand Index, an Expected Relational Rand Index, or an Expected Mutual Information Measure.
By using the value γ as a threshold on Δjk (expected improvement in quality), it is possible to determine whether to merge two clusters.
Returning again to
In particular, the process of determining whether to merge clusters may be repeatedly applied to a set of candidate cluster pairs to be merged. In some embodiments this process is repeated until there are no remaining candidate pairs to be merged or until all of candidate pairs have been determined to be not suitable for merging.
In more detail,
In that regard, additional information or factors may be used to rank clusters to be examined for merger. For example, a ranking could consider inter-cluster distance (the distance between the two clusters which are being considered for merger). Thus, in this case, first and second clusters are selected as candidates to merge from a plurality of clusters, based in part on a distance between the first and second clusters. A selection of clusters to merge might also consider cluster spread (the distance between the sub-clusters divided by the sum of the average sub-cluster deviations) deviations). Accordingly, in such an embodiment, the first and second clusters are selected as candidates to merge from a plurality of clusters, based on a distance between the first and second clusters relative to the sum of the average standard deviations of object features in the first and second clusters. In still another example, a selection of clusters to merge might consider a modified cluster spread (the distance between the sub-clusters clusters divided by the sum of the average merged cluster deviations). Put another way, in that embodiment, the first and second clusters are selected as candidates to merge from a plurality of clusters, based on a distance between the first and second clusters relative to the sum of the average standard deviations of object features in the candidate cluster. It should be understood that various other combinations of evaluations could be used in a determination.
In general, the ranking function could take a plurality of rank factors and threshold scores and combine them in a way that the order of the cluster mergers can provide the best increase in knowledge representation and retention as measured by any number of measures such as Adjusted RRI (Relational Rand Index), Adjusted RI (Rand Index), and Adjusted Mutual Information, as just a few examples.
Returning again to
In step 409, a display of the merged clusters is output (e.g., on display screen 42). For example, a representative image of a merged cluster of images could be selected as a representative image of the cluster for display. In step 410, the process ends.
Further examples of the cluster merger method will now be described with respect to
In particular,
Specifically,
Meanwhile,
Turning to
Turning to
By determining whether to merge clusters of objects based on both a cluster compactness measure and a cluster quality measure, it is ordinarily possible to create a visual vocabulary with an appropriate number of clusters. For example, it is ordinarily possible to create a visual vocabulary which generalizes when necessary (i.e. when there is insufficient data to be more specific or too much noise or variation to be more specific), but also has a sufficient number of visual words to describe different visual features.
An alternative embodiment might instead consider whether to split a single cluster into two clusters, using the same compactness and quality measures, and based on how the split clusters would look. In such an embodiment, the system or a user could determine the hyper-plane (e.g., by a user interface) using a known clustering technique, and essentially use the above processes in reverse.
Thus, according to such an alternative embodiment, an existing cluster of objects is split into a plurality of clusters. Semantic information of at least one of the objects in the existing cluster is input. A respective compactness is evaluated of each of a first candidate cluster and a second candidate cluster to be formed when the existing cluster is split. A respective cluster quality is evaluated of each of the first candidate cluster and the second candidate cluster, based on the semantic information. The existing cluster is split in a case that the respective compactness of the first candidate cluster and the second candidate cluster relative to the compactness of the existing cluster each exceed a compactness threshold, or the respective cluster quality of the first candidate cluster and the second candidate cluster relative to a cluster quality of the existing cluster each exceed a cluster quality threshold.
In other embodiments, the compactness threshold for cluster splitting is weighted more leniently. In other words, if the splitting of the cluster, as determined by some known clustering technique for example, has been recommended, the change in the cluster quality can be considered as a more important criterion than compactness. Thus, compact clusters may be allowed to be split when doing so results in improved cluster quality.
According to other embodiments contemplated by the present disclosure, example embodiments may include a computer processor such as a single core or multi-core central processing unit (CPU) or micro-processing unit (MPU), which is constructed to realize the functionality described above. The computer processor might be incorporated in a stand-alone apparatus or in a multi-component apparatus, or might comprise multiple computer processors which are constructed to work together to realize such functionality. The computer processor or processors execute a computer-executable program (sometimes referred to as computer-executable instructions or computer-executable code) to perform some or all of the above-described functions. The computer-executable program may be pre-stored in the computer processor(s), or the computer processor(s) may be functionally connected for access to a non-transitory computer-readable storage medium on which the computer-executable program or program steps are stored. For these purposes, access to the non-transitory computer-readable storage medium may be a local access such as by access via a local memory bus structure, or may be a remote access such as by access via a wired or wireless network or Internet. The computer processor(s) may thereafter be operated to execute the computer-executable program or program steps to perform functions of the above-described embodiments.
According to still further embodiments contemplated by the present disclosure, example embodiments may include methods in which the functionality described above is performed by a computer processor such as a single core or multi-core central processing unit (CPU) or micro-processing unit (MPU). As explained above, the computer processor might be incorporated in a stand-alone apparatus or in a multi-component apparatus, or might comprise multiple computer processors which work together to perform such functionality. The computer processor or processors execute a computer-executable program (sometimes referred to as computer-executable instructions or computer-executable code) to perform some or all of the above-described functions. The computer-executable program may be pre-stored in the computer processor(s), or the computer processor(s) may be functionally connected for access to a non-transitory computer-readable storage medium on which the computer-executable program or program steps are stored. Access to the non-transitory computer-readable storage medium may form part of the method of the embodiment. For these purposes, access to the non-transitory computer-readable storage medium may be a local access such as by access via a local memory bus structure, or may be a remote access such as by access via a wired or wireless network or Internet. The computer processor(s) is/are thereafter operated to execute the computer-executable program or program steps to perform functions of the above-described embodiments.
The non-transitory computer-readable storage medium on which a computer-executable program or program steps are stored may be any of a wide variety of tangible storage devices which are constructed to retrievably store data, including, for example, any of a flexible disk (floppy disk), a hard disk, an optical disk, a magneto-optical disk, a compact disc (CD), a digital versatile disc (DVD), micro-drive, a read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), dynamic random access memory (DRAM), video RAM (VRAM), a magnetic tape or card, optical card, nanosystem, molecular memory integrated circuit, redundant array of independent disks (RAID), a nonvolatile memory card, a flash memory device, a storage of distributed computing systems and the like. The storage medium may be a function expansion unit removably inserted in and/or remotely accessed by the apparatus or system for use with the computer processor(s).
This disclosure has provided a detailed description with respect to particular representative embodiments. It is understood that the scope of the appended claims is not limited to the above-described embodiments and that various changes and modifications may be made without departing from the scope of the claims.