The accompanying drawings, which are included to provide a further understanding of the invention, are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the principle of the invention. In the drawings:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings.
Hereinafter, a method and system for semidefinite spectral clustering via convex programming according to an embodiment of the present invention will be described with reference to accompanying drawings.
That is,
Although a well-known conventional spectral clustering method also uses graph partitioning that is an object of the present invention, the clustering method according to the present embodiment is different therefrom in a relaxation method. The conventional spectral clustering method using a spectral relaxation method groups data with adjacent clusters using the eigenvectors of an affinity matrix that represents similarity or a graph Laplacian generated from data. On the contrary, the semidefinite spectral clustering method according to the present embodiment clusters data using the eigenvectors of an optima feasible Solution that is obtained to determine whether given strong duality for semidefinite relaxation is satisfied or not. That is, since the semidefinite relaxation makes it possible to obtain a globally optimal solution in various combination problems such as graph multi-way partition, the semidefinite relaxation is used in the clustering method according to the present embodiment.
As shown in
The optimization steps S2 and S3 are steps for obtaining the globally optimal solution that satisfies strong duality and an object function which are defined by a user. In more detail, an optimal feasible matrix is calculated using semidefinite programming at step S2, and an optimal partition matrix is calculated from the optimal feasible matrix at step S3. The optimization steps S2 and S3 will be described in more detail with reference to
The clustering step S4 is the last step that clusters data using the optimal feasible matrix obtained from the optimization step. The clustering step S4 will be described in more detail with reference to
The object function is defined as argx min tr(XT LX).
Herein, X denotes an optimal partition matrix, L is a graph Laplacian, and T denotes the transpose of a matrix.
In order to cluster data, clustering methods including k-means, EM, or k-nn may be used.
The optimal feasible solution is defined based on the similarity or the difference between data. When the affinity matrix or the difference matrix of the data is generated, it is preferable to use a kernel function. Herein, the object of the optimization is to obtain the optimal feasible solution that satisfies the given strong duality. All solutions in a range of satisfying the given strong duality are feasible solutions, and one having the height value or the smallest value among the feasible solutions is the optimal feasible solution. It is preferable to extract feature points from the data for generating the affinity matrix and the difference matrix of the data. It is further preferable to apply the affinity matrix and the difference matrix to identical data or different data.
The flowchart shown in
As shown in
Herein, it is determined whether a relaxed standard semidefinite programming satisfies the strong duality or not at step S15. Herein, the relaxed standard SDP is a function relaxed through semidefinite programming which is one of convex programs. If the strong duality is not satisfied by the relaxed stand SDP, the optimal solution is obtained based on a barycenter-based method using the barycenter matrix of convex hull for partition matrices at step S16. If the strong duality is satisfied by the relaxed stand SDP, the optimal solution is calculated using an interior-point method that is one of Newton's methods as a technique for solving a linear equality constrained optimization problem at step S17. Herein, the interior-point method solves an optimization problem with linear equality and inequality constraints by reducing it to a sequence of linear equality constrained problems.
The flowchart shown in
A clustering simulation is performed by making the structure of matrix directly related to the generation of eigenvector to have a block diagonal structure using the semidefinite relaxation and forming principle vectors, the 1st column vector, and the 2nd column vector, obtained from the optimal feasible matrix, and the clustering result of the clustering simulation (sample data set) is illustrated in
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention. Thus, it is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
As described above, the method for clustering data using convex optimization according to the present invention can be used in various fields where vast data are classified and analyzed. Such an automation process can save huge resources such as time and man power. Also, the method for clustering data using convex optimization according to the present invention can simultaneously cluster not only homogenous data but also heterogeneous data. Therefore, useful data can be provided to a user. Furthermore, the method for clustering data using convex optimization according to the present invention can provide the reliable clustering performance by overcoming the heuristic limitation of the conventional clustering methods through the convex optimization.
Number | Date | Country | Kind |
---|---|---|---|
10-2006-0064551 | Jul 2006 | KR | national |
10-2007-0057223 | Jun 2007 | KR | national |