LATER-FUSION MULTIPLE KERNEL CLUSTERING MACHINE LEARNING METHOD AND SYSTEM BASED ON PROXY GRAPH IMPROVEMENT

TECHNICAL FIELD

The present disclosure relates to the technical field of machine learning, and in particular to a later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement.

BACKGROUND

Aiming at dividing unlabeled data into several unrelated classes, clustering plays an important role in machine learning and data analysis. In the era of big data, data may come from multiple sources, known as multi-view data. Methods for clustering multi-view data are called multi-view clustering algorithms. Multi-kernel clustering algorithms are an important branch of multi-view clustering, with the aim of making full use of a series of predefined basis kernels to improve clustering performance.

Existing multi-kernel clustering algorithms can be roughly divided into two categories, i.e. early fusion and later fusion, according to the timing of fusion. Early fusion refers to a fusion of several kernel matrices before performing a kernel k-means algorithm. Specifically, a method of regularization term induced by matrix (X. Liu, Y Dou, J. Yin, et al., Multiple Kernel K-means Clustering with Matrix-induced Regularization in AAAI 2016, pp. 1888-1894) can be used to adaptively adjust a kernel coefficient according to the similarity of kernel matrices, such that the redundancy of similar information is avoided, and the quality of the optimal kernel matrix is accordingly improved. A method for maintaining the local structure of a kernel (M. Gonen and A. A. Margolin, Localized Data Fusion for Kernel K-means Clustering with Application to Cancer Biology in Neur IPS 2014, pp. 1305-1313) can also improve the effects of algorithms.

For later fusion of multi-kernel clustering first, the kernel k-means algorithm is first performed for basis kernel matrices to obtain basic divisions, and then these basic divisions are fused. A later fusion algorithm based on maximum alignment (S. Wang, X. Liu, E. Zhu, et al., Multi-view Clustering via Late Fusion Alignment Maximization in IJCAI 2019, pp. 3778-3784) makes use of permutation matrices to achieve an alignment effect of the basic divisions, and then combines the basic divisions. A later fusion method proposed by Liu et al. (X. Liu, M. Li, C. Tang, et al., Efficient and Effective Regularized Incomplete Multi-view Clustering in T-PAMI 2020) is capable of handling incomplete view data and obtaining a good clustering effect.

Compared with the early fusion, the later fusion features very low computational and storage complexities, and relatively desirable clustering performance. However, existing later fusion clustering algorithms still have the following deficiencies: first, a clustering process of the basis kernel is separated from a later fusion process of the basic divisions, in which case, the quality of the basic divisions has a great impact on the performance of the final clustering, and outliers and noises therein, if any, will result in an unsatisfactory clustering effect. Second, existing methods simply take a consistency division as a linear transformation of the basic divisions, making it difficult to be applied to the field of multi-kernel data in reality.

SUMMARY

In order to overcome deficiencies in the prior art, the present disclosure provides a later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement.

In order to achieve the above objective, the present disclosure adopts the following technical solution:

- a later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement, including the following steps:
- S1. acquiring a clustering task and a target data sample;
- S2. initializing a proxy graph improvement matrix;
- S3. running k-means clustering and graph improvement on each view corresponding to the acquisition of the clustering task and the target data sample, and constructing an objective function by combining kernel k-means clustering and graph improvement methods;
- S4. cyclically solving the objective function constructed in the step S3 so as to obtain a graph matrix, which is fused with basic kernel information; and
- S5. performing spectral clustering on the obtained graph matrix, so as to obtain a final clustering result.

Further, the objective function of kernel k-means clustering constructed in the step S3 is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} \sum_{i = 1, c = 1}^{n, k} B_{ic} { ϕ (x_{i}) - μ_{c} }_{2}^{2} s . t . \sum_{c = 1}^{k} B_{ic} = 1 & (1) \end{matrix}$

- where {x_i}_i=1ⁿ⊆ represents a data set consisting of n samples; B∈{0,1}^n×krepresents a clustering indicator matrix, when an i^thsample belongs to a c^thcluster, B_ic=1, otherwise, B_ic=0; ϕ:x∈→ represent feature mapping that a sample x is projected to a reproducing kernel Hilbert space ; μ_c=(1/n_c)Σ_i=1ⁿB_icϕ(x_i), where n_crepresents the number of samples belonging to the c^thcluster, x_irepresents a data sample; i represents a sample serial number; n represents the number of sample points; and k represents a total number of clusters.

Assuming <ϕ(x_i),ϕ(x_j)>=K_ij, where K_ijrepresents elements of a kernel matrix K, then Equation (1) is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} Tr (K) - Tr (L^{\frac{1}{2}} B^{_{} T} {KBL}^{\frac{1}{2}}) s . t . B 1_{k} = 1_{n} & (2) \end{matrix}$

- where K represents the kernel matrix, L=diag([n₁⁻¹, . . . , n_k⁻¹]), n_k⁻¹represents a reciprocal of a total number of samples belonging to a k^thcluster, 1_k∈R^krepresents a vector with all elements being 1; and B^Trepresents a transpose of B.

Assuming

$H = {BL}^{\frac{1}{2}} and H^{T} H = I_{k},$

then Equation (2) is expressed as:

$\begin{matrix} \min_{H^{T} H = I_{k}} Tr (K (I_{n} - {HH}^{T})) & (3) \end{matrix}$

- where H^Trepresents a transpose of H, I_nrepresents an n-dimensional identity matrix, and I_krepresents a k-dimensional identity matrix.

Further, the objective function constructed in the step S3 is expressed as:

$\begin{matrix} \min_{S, {H_{i}}_{i = 1}^{m}} \sum_{i = 1}^{m} Tr (K_{i} (I_{n} - H_{i} H_{i}^{T})) + λ { H_{i} - {SH}_{i} }_{F}^{2} + β { S }_{F}^{2} & (4) \end{matrix}$

$\begin{matrix} s . t . S \geq 0, S 1 = 1, diag (S) = 0, H_{i}^{T} H_{i} = I_{k} & (5) \end{matrix}$

- where H_irepresents a basic division matrix obtained from an i^thrunning kernel k-means clustering; λ and β represent hyperparameters that adjust a proportion of each item; H_i^Trepresents a transpose of H_i; S represents a proxy graph matrix; and I_nrepresents the n-dimensional identity matrix.

Further, the objective function constructed in the step S3 is cyclically solved in the step S4, specifically:

S41. fixing S and optimizing {H_i}_i=1^m, being expressed as:

$\begin{matrix} \min_{H_{i}} Tr (K_{i} (I_{n} - H_{i} H_{i}^{T})) + λ { H_{i} - {SH}_{i} }_{F}^{2}, s . t . H_{i}^{T} H_{i} = I_{k} & (6) \end{matrix}$

- assuming G=K_i−λ(I_n−2S+SS^T), then Equation (6) is expressed as:

$\begin{matrix} Tr ({GH}_{i} H_{i}^{T}), s . t . H_{i}^{T} H_{i} = I_{k} & (7) \end{matrix}$

- performing eigendecomposition on G, assuming that H_irepresents an eigenvector corresponding to its first k largest eigenvalues, and then obtaining the optimal solution;
- S42. fixing {H_i}_i=1^mand optimizing S, being expressed as:

$\begin{matrix} \min_{s} \sum_{i = 1}^{m} λ { H_{i} - {SH}_{i} }_{F}^{2} + β { S }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (S) = 0 & (8) \end{matrix}$

Equation (8) is solved through the steps S421 and S422:

- S421. solving an unconstrained solution of Equation (8), being expressed as:

$\begin{matrix} \hat{S} = \underset{S}{argmin} \sum_{i = 1}^{m} λ { H_{i} - {SH}_{i} }_{F}^{2} + β { S }_{F}^{2} & (9) \end{matrix}$

Using a derivative 0 to obtain a closed-form solution

$\hat{S} = {(C + (\frac{β}{λ}) I)}^{- 1} C, where C = \sum_{i = 1}^{m} H_{i} H_{i}^{T};$

- S422. calculating a solution closest to Ŝ that satisfies constraints through Equation (10):

$\begin{matrix} \min_{S} { S - \hat{S} }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (S) = 0 & (10) \end{matrix}$

- where Ŝ represents a solution of the proxy graph matrix when being unconstrained.

Obtaining a closed-form solution:

$\begin{matrix} S_{j, :} = \max ({\hat{S}}_{j, :} + α_{j} 1, 0), S_{jj} = 0, α_{j} = \frac{1 + {\hat{S}}_{j, :}^{T} 1}{n} & (11) \end{matrix}$

where S_j,:represents a j^thcolumn of a matrix S, α_jrepresents an intermediate variable used for solution; Ŝ_j,:represents a j^thcolumn of Ŝ; and Ŝ_j,:^Trepresents a transpose of Ŝ_j,:.

Further, the objective function constructed in the step S3 is cyclically solved, with a cycle termination condition being expressed as:

$\begin{matrix} \frac{{obj}^{(t - 1)} - {obj}^{(t)}}{{obj}^{(t)}} \leq ε & (12) \end{matrix}$

- where obj^(t-1)and obj^(t)represent values of the objective function at t^thand t−1^thiterations, respectively; and E represents a set precision.

Correspondingly, the present disclosure further provides a later-fusion multiple kernel clustering machine learning system based on proxy graph improvement, including:

- an acquisition module for acquiring a clustering task and a target data sample;
- an initialization module for initializing a proxy graph improvement matrix;
- a construction module for running k-means clustering and graph improvement on each view corresponding to the acquisition of the clustering task and the target data sample, and constructing an objective function by combining kernel k-means clustering and graph improvement methods;
- a solution module for cyclically solving the constructed objective function, so as to obtain a graph matrix, which is fused with basic kernel information; and
- a clustering module for performing spectral clustering on the obtained graph matrix, so as to obtain a final clustering result.

Further, the objective function of kernel k-means clustering in the construction module is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} \sum_{i = 1, c = 1}^{n, k} B_{i c} { ϕ (x_{i}) - μ_{c} }_{2}^{2} s . t . \sum_{c = 1}^{k} B_{ic} = 1 & (1) \end{matrix}$

- where {x_i}_i=1ⁿ⊆ represents a data set consisting of n samples; B∈{0,1}^n×krepresents a clustering indicator matrix, when an i^thsample belongs to a c^thcluster, B_ic=1, otherwise, B_ic=0; ϕ:x∈→ represents feature mapping that a sample x is projected to a reproducing kernel Hilbert space ; μ_c=(1/n_c)Σ_i=1ⁿ=B_icϕ(x_i), where n_crepresents the number of samples belonging to the c^thcluster, x_irepresents a data sample; and i represents a sample serial number; n represents the number of sample points; and k represents a total number of clusters.

assuming <ϕ(x_i),ϕ(x_j)>=K_ij, where K_ijrepresents elements of a kernel matrix K, then Equation (1) is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} Tr (K) - Tr (L^{\frac{1}{2}} B^{T} {KBL}^{\frac{1}{2}}) s . t . B 1_{k} = 1_{n} & (2) \end{matrix}$

- where K represents the kernel matrix, L=diag([n₁⁻¹, . . . , n_k⁻¹]), n_k⁻¹represents a reciprocal of a total number of samples belonging to a k^thcluster, 1_k∈R^krepresents a vector with all elements being 1; and B^Trepresents a transpose of B.
- assuming

$H = {BL}^{\frac{1}{2}} H^{T} H = I_{k},$

then Equation (2) is expressed as:

$\begin{matrix} \min_{H^{T} H = I_{k}} Tr (K (I_{n} - {HH}^{T})) & (3) \end{matrix}$

- where H^Trepresents a transpose of H, I_nrepresents an n-dimensional identity matrix, and I_krepresents a k-dimensional identity matrix.

Further, the objective function constructed in the construction module is expressed as:

$\begin{matrix} \min_{S, {H_{i}}_{i = 1}^{m}} \sum_{i = 1}^{m} T r (K_{i} (I_{n} - H_{i} H_{i}^{T})) + λ { H_{i} - S H_{i} }_{F}^{2} + β { S }_{F}^{2} & (4) \end{matrix}$

$\begin{matrix} s . t . S \geq 0, S 1 = 1, diag (S) = 0, H_{i}^{T} H_{i} = I_{k} & (5) \end{matrix}$

- where H_irepresents a basic division matrix obtained from an i^thrunning kernel k-means clustering; λ and β represent hyperparameters that adjust a proportion of each item; H_i^Trepresents a transpose of H_i; S represents a proxy graph matrix; and I_nrepresents the n-dimensional identity matrix.

Further, the constructed objective function is cyclically solved in the solution module, specifically:

- a first fixed module is used for fixing S and optimizing {H_i}_i=1^m, being expressed as:

$\begin{matrix} \min_{H_{i}} T r (K_{i} (I_{n} - H_{i} H_{i}^{T})) + λ { H_{i} - S H_{i} }_{F}^{2}, s . t . H_{i}^{T} H_{i} = I_{k} & (6) \end{matrix}$

- assuming G=K_i−λ(I−2S+SS^T), then Equation (6) is expressed as:

$\begin{matrix} T r (G H_{i} H_{i}^{T}), s . t . H_{i}^{T} H_{i} = I_{k} & (7) \end{matrix}$

- performing eigendecomposition on G, assuming that H_irepresents an eigenvector corresponding to its first k largest eigenvalues, and then obtaining the optimal solution;
- a second fixed module is used for fixing {H_i}_i=1^mand optimizing S, being expressed as:

$\begin{matrix} \min_{S} \sum_{i = 1}^{m} λ { H_{i} - S H_{i} }_{F}^{2} + β { S }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (S) = 0 & (8) \end{matrix}$

Solving Equation (8):

- solving an unconstrained solution of Equation (8), being expressed as:

$\begin{matrix} \hat{S} = \underset{S}{\arg \min} \sum_{i = 1}^{m} λ { H_{i} - S H_{i} }_{F}^{2} + β { S }_{F}^{2} & (9) \end{matrix}$

- using a derivative 0 to obtain a closed-form solution

$\hat{S} = {(C + (\frac{β}{λ}) I)}^{- 1} C, where C = Σ_{i = 1}^{m} H_{i} H_{i}^{T};$

- calculating a solution closest to S that satisfies constraints:

$\begin{matrix} \min_{S} { S - \hat{S} }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (s) = 0 & (10) \end{matrix}$

- where Ŝ represents a solution of the proxy graph matrix when being unconstrained.

Obtaining a closed-form solution:

$\begin{matrix} S_{j, :} = \max ({\hat{S}}_{j, :} + α_{i} 1, 0) S_{jj} = 0, α_{j} = \frac{1 + {\hat{S}}_{j, :}^{T} 1}{n} & (11) \end{matrix}$

- where S_j,:represents a j^thcolumn of a matrix S, α_jrepresents an intermediate variable used for solution; Ŝ_j,:represents a j^thcolumn of Ŝ; and S_j,:^Trepresents a transpose of Ŝ_j,:.

Further, the constructed objective function is cyclically solved, with a cycle termination condition being expressed as:

$\begin{matrix} \frac{o b j^{(t - 1)} - o b j^{(t)}}{o b j^{(t)}} \leq ε & (12) \end{matrix}$

- where obj^(t-1)and obj^(t)represent values of the objective function at t^thand t−1^thiterations, respectively; and E represents a set precision.

Compared with the prior art, the present disclosure provides a novel later-fusion multiple kernel clustering machine learning method based on proxy graph improvement, and the method includes a basic division acquisition module, a proxy graph construction module, a basic division improvement module through the proxy graph, a spectral clustering module through the proxy graph, and the like. By optimizing the basic division, an optimized basic division not only has information of a single kernel, but can also obtain global information by means of a proxy graph, which is more beneficial to fusing views, such that a learned proxy graph can better fuse information of each kernel matrix, thereby realizing an aim of improving a clustering effect. Results of experiments on six multi-kernel data sets prove that the performance of the present disclosure is better than those of existing methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of a later-fusion multiple kernel clustering machine learning method based on proxy graph improvement provided in Embodiment 1 of the present disclosure.

FIG. 2 is a schematic diagram of a later-fusion multiple kernel clustering based on proxy graph improvement provided in Embodiment 1 of the present disclosure.

FIG. 3 is a schematic diagram of changes in values of objective function as the number of iterations increases provided in Embodiment 2 of the present disclosure.

FIG. 4 is a schematic diagram of parameter sensitivity provided in Embodiment 2 of the present disclosure.

DETAILED DESCRIPTIONS OF THE EMBODIMENTS

The implementation of the present disclosure will be illustrated below in conjunction with specific embodiments. Those skilled in the art can easily understand other advantages and effects of the present disclosure from the content disclosed in the Specification. The present disclosure can also be implemented or applied through other different specific implementations, and various modifications or variations can be made to details in the specification based on different viewpoints and applications without departing from the spirit of the present disclosure. It should be noted that the embodiments below and features in the embodiments can be combined with each other, so long as they are not in conflict with each other.

In order to overcome defects of the prior art, the present disclosure provides a later-fusion multiple kernel clustering machine learning method and system based on proxy graph improvement.

Embodiment 1

This embodiment provides a later-fusion multiple kernel clustering machine learning method based on proxy graph improvement, as shown in FIGS. 1-2, including the following steps:

- S1. acquiring a clustering task and a target data sample;
- S2. initializing a proxy graph improvement matrix;
- S3. running k-means clustering and graph improvement on each view corresponding to the acquisition of the clustering task and the target data sample, and constructing an objective function by combining kernel k-means clustering and graph improvement methods;
- S4. cyclically solving the objective function constructed in the step S3 so as to obtain a graph matrix, which is fused with basic kernel information; and
- S5. performing spectral clustering on the obtained graph matrix, so as to obtain a final clustering result.

In the step S3, k-means clustering and graph improvement are run on each view corresponding to the acquisition of the clustering task and the target data sample, and an objective function is constructed by combining kernel k-means clustering and graph improvement methods.

A kernel k-means clustering objective function is expressed as follows: {x_i}_i=1ⁿ⊆ custom-character represents a data set consisting of n samples, assuming that a kernel function is κ(⋅, ⋅), and κ(x,x′)=<ϕ(x),ϕ(x′)> according to the nature of a reproducing kernel, where ϕ:x∈→ represents feature mapping that a sample x is projected to a reproducing kernel Hilbert space custom-character . ϕ(x) is substituted into an objective function of k-means clustering to obtain an objective function of the kernel k-means clustering, which is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} \sum_{i = 1, c = 1}^{n, k} B_{i c} { ϕ (x_{i}) - μ_{c} }_{2}^{2} s . t . \sum_{c = 1}^{k} B_{i c} = 1 & (1) \end{matrix}$

- where B∈{0,1}^n×krepresents a clustering indicator matrix, when the i^thsample belongs to the c^thcluster, B_ic=1, otherwise, B_ic=0; μ_c=(1/n_c)Σ_i=1ⁿB_icϕ(x_i), where n_crepresents the number of samples belonging to the c^thcluster, x_irepresents a data sample; i represents a sample serial number; n represents the number of sample points; and k represents a total number of clusters.

The kernel trick is used, assuming <ϕ(x_i),ϕ(x_j)>=K_ij, where K_ijrepresents elements of a kernel matrix K, then Equation (1) is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} T r (K) - T r (L^{\frac{1}{2}} B^{T} K B L^{\frac{1}{2}}) s . t . B 1_{k} = 1_{n} & (2) \end{matrix}$

- where K represents the kernel matrix, L=diag([n₁⁻¹, . . . , n_k⁻¹]), n_k⁻¹represents a reciprocal of a total number of samples belonging to a k^thcluster, 1_k∈R^krepresents a vector with all elements being 1; and B^Trepresents a transpose of B.

An optimization about B in Equation (2) has been proved to be an NP-hardness problem, therefore, discrete constraints of B are transformed into real-valued orthogonal constraints, and assuming

$H = B L^{\frac{1}{2}} and H^{T} H = I_{k},$

Equation (2) is then expressed as:

$\begin{matrix} \min_{H^{T} H = I_{k}} T r (K (I_{n} - H H^{T})) & (3) \end{matrix}$

- where H^Trepresents a transpose of H, I_nrepresents an n-dimensional identity matrix, and I_krepresents a k-dimensional identity matrix.

In this embodiment, eigendecomposition can be performed on the kernel matrix K, and the optimal H is an eigenvector corresponding to the first k largest eigenvalues of the K.

The function improvement of the graph improvement part is specifically as follows: assuming that the basic division obtained by an i^thrunning kernel k-means clustering is H_i, in order to obtain the global information from the basic division, the basic division can be adjusted by minimizing ∥H_i−SH_i∥_F², where S is a graph matrix shared by all basis kernels, satisfying S≥0, S1=1, and elements on a diagonal are 0.

The “constructing an objective function by combining kernel k-means clustering and graph improvement methods” is expressed as:

- where H_irepresents a basic division matrix obtained from a i^thrunning kernel k-means clustering; λ and β represent hyperparameters that adjust a proportion of each item; H_i^Trepresents a transpose of H_i; S represents a proxy graph matrix; and I_nrepresents the n-dimensional identity matrix.

Since Equation (4) can make use of S for adjusting the H_i, the algorithm is named “later-fusion multiple kernel clustering based on proxy graph improvement”.

In the step S4, the objective function constructed in the step S3 is cyclically solved so as to obtain a graph matrix, which is fused with basic kernel information.

The objective function can be solved using the following two-step iterative method, specifically:

- S41. fixing S and optimizing {H_i}_i=1^m, each H_ican be optimized individually, which is expressed as:

$\begin{matrix} \min_{H_{i}} T r (K_{i} (I_{n} - H_{i} H_{i}^{T})) + λ { H_{i} - S H_{i} }_{F}^{2}, s . t . H_{i}^{T} H_{i} = I_{k} & (6) \end{matrix}$

- assuming G=K_i−λ(I_n−2S+SS^T), then Equation (6) is expressed as:

$\begin{matrix} T r (G H_{i} H_{i}^{T}), s . t . H_{i}^{T} H_{i} = I_{k} & (7) \end{matrix}$

- performing eigendecomposition on G, assuming that H_irepresents an eigenvector corresponding to its first k largest eigenvalues, and then obtaining the optimal solution;
- S42. fixing {H_i}_i=1^mand optimizing S, the optimization problem can be then transformed into the following Equation, being expressed as:

$\begin{matrix} \min_{S} \sum_{i = 1}^{m} λ { H_{i} - S H_{i} }_{F}^{2} + β { S }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (S) = 0 & (8) \end{matrix}$

- Equation (8) is solved through the steps S421 and S422:
- S421. solving an unconstrained solution of Equation (8), being expressed as:

$\begin{matrix} \hat{S} = \underset{S}{\arg \min} \sum_{i = 1}^{m} λ { H_{i} - S H_{i} }_{F}^{2} + β { S }_{F}^{2} & (9) \end{matrix}$

Using a derivative 0 to obtain a closed-form solution

$\hat{S} = {(C + (\frac{β}{λ}) I)}^{- 1} C, where C = Σ_{i = 1}^{m} H_{i} H_{i}^{T};$

- S422. calculating a solution closest to Ŝ that satisfies constraints through Equation (10):

$\begin{matrix} \min_{S} { S - \hat{S} }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (S) = 0 & (10) \end{matrix}$

- where Ŝ represents a solution of the proxy graph matrix when being unconstrained.

Obtaining a closed-form solution:

$\begin{matrix} S_{j, :} = \max ({\hat{S}}_{j, :} + α_{j} 1, 0), S_{jj} = 0, α_{j} = \frac{1 + {\hat{S}}_{j, :}^{T} 1}{n} & (11) \end{matrix}$

- where S_j,:represents a j^thcolumn of a matrix S, α_jrepresents an intermediate variable used for solution; Ŝ_j,:represents a j^thcolumn of Ŝ; and Ŝ_j,:^Trepresents a transpose of Ŝ_j,:.

A termination condition of the above two-step (the steps S41 and S42) alternating method is expressed as:

$\begin{matrix} \frac{{obj}^{(t - 1)} - {obj}^{(t)}}{{obj}^{(t)}} \leq ε & (12) \end{matrix}$

- where obj^(t-1)and obj^(t)represent values of the objective function at t^thand t−1^thiterations, respectively; and E represents a set precision.

In the step S5, spectral clustering is performed on the obtained graph matrix, so as to obtain a final clustering result.

A standard spectral clustering algorithm is performed on the outputted graph matrix S to obtain the final clustering result.

This embodiment provides a novel later-fusion multiple kernel clustering machine learning method based on proxy graph improvement, and the method includes a basic division acquisition module, a proxy graph construction module, a basic division improvement module through the proxy graph, a spectral clustering module through the proxy graph, and the like. By optimizing the basic division, an optimized basic division not only has information of a single kernel, but can also obtain global information by means of a proxy graph, which is more beneficial to fusing views, such that a learned proxy graph can better fuse information of each kernel matrix, thereby realizing an aim of improving a clustering effect.

Embodiment 2

A later-fusion multiple kernel clustering machine learning method based on proxy graph improvement provided in this embodiment is distinguished from that in Embodiment 1 by:

In this embodiment, the clustering performance of the method of the present disclosure is tested on six MKL standard data sets.

The six MKL standard data sets are AR10P, YALE, Protein fold prediction, Oxford Flower17, Nonplant and Oxford Flower102. Information of the data sets is illustrated in Table 1.

TABLE 1

Dataset
Samples
Kernels
Clusters

AR10P
130
6
10

YALE
165
5
15

ProteinFold
694
12
27

Flower17
1360
7
17

Nonplant
2372
69
3

Flower102
8189
4
102

For ProteinFold, this embodiment generates 12 benchmark kernel matrices, in which the first 10 feature sets adopt second-order polynomial kernels, and the last two adopt cosine inner product kernels. Kernel matrices of other data sets can be publicly downloaded from the Internet.

This embodiment adopts such algorithms as a best single-view kernel k-means clustering (BSKM), a multiple kernel k-means clustering (MKKM), a co-regularized spectral clustering (CRSC), a robust multiple kernel k-means clustering (RMKKM), a robust multi-view spectral clustering (RMSC), a multiple kernel k-means with matrix-induced regularization (MKMR), a multiple kernel clustering with local kernel alignment maximization (MKAM), a multi-view clustering via later fusion alignment maximization (MLFA) and a flexible multi-view representation learning for subspace clustering (FMR). In all experiments, all basis kernels are first centered and regularized. For all datasets, the number of classes is assumed to be known and set to be the number of cluster classes. Parameters of the compared algorithms adopted in this experiment are all set according to the corresponding literature. Parameters λ and β of the method are chosen from [2⁻², 2⁻¹, . . . , 2²] through grid search.

The embodiment adopts widely used clustering accuracy (ACC), normalized mutual information (NMI) and purity to evaluate the clustering performance of each algorithm. For all algorithms, each experiment is repeated for 50 times with random initialization, and the best result is reported to reduce the effect of randomness caused by k-means.

TABLE 2

Dataset
BSKM
MKKM
CRSC
RMKKM
RMSC
MKMR
MKAM
MLFA
FMR
Proposed

ACC (%)

AR10P
43.08
40.00
38.46
30.77
30.77
39.23
27.69
41.54
51.23
56.15

YALE
56.97
52.12
56.97
56.36
58.03
60.00
46.67
54.55
61.21
62.42

ProteinFold
33.86
27.23
34.87
30.98
33.00
36.46
37.90
35.88
34.96
40.06

Flower17
42.06
45.37
52.35
53.38
51.10
58.82
57.87
60.16
58.78
62.28

Nonplant
49.38
54.32
55.56
49.33
60.65
56.59
59.57
50.07
36.70
67.50

Flower102
33.13
21.96
37.26
28.17
32.97
39.91
40.84
42.73
35.24
46.78

NMI (%)

AR10P
42.61
39.53
39.82
26.62
27.87
40.11
24.72
39.15
45.52
51.82

YALE
58.42
54.16
57.69
59.32
57.58
62.87
53.51
59.86
60.31
63.48

ProteinFold
42.03
37.16
43.32
38.78
43.91
45.32
44.46
44.00
43.68
48.72

Flower17
45.14
45.35
50.42
52.56
54.39
57.05
56.06
59.79
56.98
61.72

Nonplant
16.55
15.83
17.44
16.55
20.35
23.43
23.04
16.55
0.50
25.56

Flower102
48.99
42.30
54.18
48.17
53.36
57.27
57.60
57.59
57.42
60.30

Purity(%)

AR10P
43.08
40.00
39.23
32.31
33.08
39.23
28.46
41.54
51.23
56.15

YALE
57.58
52.73
57.58
58.18
57.24
60.00
49.09
55.76
61.33
62.42

ProteinFold
41.21
33.86
40.78
36.60
42.36
42.65
43.95
41.93
42.22
45.97

Flower17
44.63
46.84
53.01
55.07
54.12
60.51
59.26
62.13
59.66
63.60

Nonplant
72.18
71.45
73.17
72.18
70.50
73.33
74.34
72.18
60.36
75.29

Flower102
38.78
27.61
44.08
33.86
40.24
46.39
48.21
49.73
41.62
53.07

Table 2 shows clustering effects of the above method and the compared algorithms on the six data sets of different algorithms. It can be concluded from the above table that: 1. the proposed algorithm is superior to all compared algorithms under the three evaluation criteria; and 2. the proposed algorithm outperforms the second-best compared algorithm by 4.92%, 1.21%, 2.16%, 2.12%, 6.85% and 4.05% on ACC of the six data sets, respectively.

This embodiment also presents changes in the objective function of each iteration, as shown in FIG. 3. It can be seen that values of the objective function decreases monotonously and usually converge within 10 iterations, which can greatly reduce the running time of the algorithm.

FIG. 4 shows the parameter sensitivity, taking two data sets such as AR10P and Flower17 as examples. It can be seen from the figure that the proposed algorithm is relatively stable for both hyperparameters and can achieve good performance in a wide range.

The experimental results of this embodiment on six multi-kernel data sets prove that the performance of the present disclosure is better than those of existing methods.

Embodiment 3

This embodiment provides a later-fusion multiple kernel clustering machine learning system based on proxy graph improvement, including:

- an acquisition module for acquiring a clustering task and a target data sample;
- an initialization module for initializing a proxy graph improvement matrix;
- a construction module for running k-means clustering and graph improvement on each view corresponding to the acquisition of the clustering task and the target data sample, and constructing an objective function by combining kernel k-means clustering and graph improvement methods;
- a solution module for cyclically solving the constructed objective function, so as to obtain a graph matrix, which is fused with basic kernel information; and
- a clustering module for performing spectral clustering on the obtained graph matrix, so as to obtain a final clustering result.

Further, the objective function of kernel k-means clustering in the construction module is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} \sum_{i = 1, c = 1}^{n, k} B_{i c} { ϕ (x_{i}) - μ_{c} }_{2}^{2} s . t . \sum_{c = 1}^{k} B_{i c} = 1 & (1) \end{matrix}$

- where {x_i}_i=1ⁿ⊆ represents a data set consisting of n samples; B∈{0,1}^n×krepresents a clustering indicator matrix, when an i^thsample belongs to a c^thcluster, B_ic=1, otherwise, B_ic=0; ϕ:x∈→ represent feature mapping that a sample x is projected to a reproducing kernel Hilbert space ; μ_c=(1/n_c)Σ_i=1ⁿB_icϕ(x_i), where n_crepresents the number of samples belonging to the c^thcluster, x_irepresents a data sample; i represents a sample serial number; n represents the number of sample points; and k represents a total number of clusters.

Assuming <ϕ(x_i),ϕ(x_j)>=K_ij, where K_ijrepresents elements of a kernel matrix K, then Equation (1) is expressed as:

$\begin{matrix} \min_{B \in {0, 1}^{n \times k}} T r (K) - Tr (L^{\frac{1}{2}} B^{T} K B L^{\frac{1}{2}}) s . t . B 1_{k} = 1_{n} & (2) \end{matrix}$

- where K represents the kernel matrix, L=diag([n₁⁻¹, . . . , n_k⁻¹]), n_k⁻¹represents a reciprocal of a total number of samples belonging to a k^thcluster, 1_k∈R^krepresents a vector with all elements being 1; and B^Trepresents a transpose of B.
- assuming

$H = B L^{\frac{1}{2}} and H^{T} H = I_{k},$

then Equation (2) is expressed as:

$\begin{matrix} \min_{H^{T} H = I_{k}} T r (K (I_{n} - H H^{T})) & (3) \end{matrix}$

- where H^Trepresents a transpose of H, I_nrepresents an n-dimensional identity matrix, and I_krepresents a k-dimensional identity matrix.

Further, the objective function constructed in the construction module is expressed as:

- where H_irepresents a basic division matrix obtained from an i^thrunning kernel k-means clustering; λ and β represent hyperparameters that adjust a proportion of each item; H_i^Trepresents a transpose of H_i; S represents a proxy graph matrix; and I_nrepresents the n-dimensional identity matrix.

Further, the constructed objective function is cyclically solved in the solution module, specifically:

- a first fixed module is used for fixing S and optimizing {H_i}_i=1^m, being expressed as:

$\begin{matrix} \min_{H_{i}} T r (K_{i} (I_{n} - H_{i} H_{i}^{T})) + λ { H_{i} - S H_{i} }_{F}^{2}, s . t . H_{i}^{T} H_{i} = I_{k} & (6) \end{matrix}$

- assuming G=K_i−λ(I_n−2S+SS^T), then Equation (6) is expressed as:

$\begin{matrix} T r (G H_{i} H_{i}^{T}), s . t . H_{i}^{T} H_{i} = I_{k} & (7) \end{matrix}$

- performing eigendecomposition on G, assuming that H_irepresents an eigenvector corresponding to its first k largest eigenvalues, and then obtaining the optimal solution;
- a second fixed module is used for fixing {H_i}_i=1^mand optimizing S, being expressed as:

$\begin{matrix} \min_{S} \sum_{i = 1}^{m} λ { H_{i} - S H_{i} }_{F}^{2} + β { S }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (S) = 0 & (8) \end{matrix}$

Solving Equation (8):

- solving an unconstrained solution of Equation (8), being expressed as:

$\begin{matrix} \hat{S} = \underset{S}{argmin} \sum_{i = 1}^{m} λ  H_{i} - {SH}_{i} _{F}^{2} + β  S _{F}^{2} & (9) \end{matrix}$

Using a derivative 0 to obtain a closed-form solution

$\hat{S} = {(C + (\frac{β}{λ}) I)}^{- 1} C, where C = \sum_{i = 1}^{m} H_{i} H_{i}^{T};$

- calculating a solution closest to Ŝ that satisfies constraints:

$\begin{matrix} \min_{S} { S - \hat{S} }_{F}^{2}, s . t . S \geq 0, S 1 = 1, diag (S) = 0 & (10) \end{matrix}$

- where Ŝ represents a solution of the proxy graph matrix when being unconstrained.

Obtaining a closed-form solution:

$\begin{matrix} S_{j, :} = \max ({\hat{S}}_{j, :} + α_{j} 1, 0), S_{jj} = 0, α_{j} = \frac{1 + {\hat{S}}_{j, :}^{T} 1}{n} & (11) \end{matrix}$

- where S_j,:represents a j^thcolumn of a matrix S, α_jrepresents an intermediate variable used for solution; Ŝ_j,:represents a j^thcolumn of Ŝ; and Ŝ_j,:^Trepresents a transpose of Ŝ_j,:.

Further, the constructed objective function is cyclically solved, with a cycle termination condition being expressed as:

$\begin{matrix} \frac{{obj}^{(t - 1)} - {obj}^{(t)}}{{obj}^{(t)}} \leq ε & (12) \end{matrix}$

- where obj^(t-1)and obj^(t)represent values of the objective function at t^thand t−1^thiterations, respectively; and E represents a set precision.

It should be noted that the later-fusion multiple kernel clustering machine learning system based on proxy graph improvement provided in this embodiment is similar to that in Embodiment 1, and will not be described in detail here.

The system provided in the present disclosure includes a basic division acquisition module, a proxy graph construction module, a basic division improvement module through the proxy graph, a spectral clustering module through the proxy graph, and the like. By optimizing the basic division, an optimized basic division not only has information of a single kernel, but can also obtain global information by means of a proxy graph, which is more beneficial to fusing views, such that a learned proxy graph can better fuse information of each kernel matrix, thereby realizing an aim of improving a clustering effect.

It should be noted that what is described above is merely illustrative of the preferred embodiments of the present disclosure and the technical principles applied. Those skilled in the art will understand that the present disclosure is not limited to the particular embodiments described herein, and various obvious changes, readjustments and substitutions may be made by those skilled in the art without departing from the scope of protection of the present disclosure. Therefore, although the present disclosure has been described in greater detail by way of the above embodiments, the present disclosure is not limited to the above embodiments and may include many other equivalent embodiments without departing from the concept of the present disclosure, the scope of the present disclosure is determined by the scope of the appended claims.

LATER-FUSION MULTIPLE KERNEL CLUSTERING MACHINE LEARNING METHOD AND SYSTEM BASED ON PROXY GRAPH IMPROVEMENT

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information