This application is the national phase entry of International Application No. PCT/CN2021/135989, filed on Dec. 7, 2021, which is based upon and claims priority to Chinese Patent Application No. 202110171227.2, filed on Feb. 8, 2021, the entire contents of which are incorporated herein by reference.
The present application relates to the technical field of signal processing and data analysis, and in particular to a consensus graph learning-based multi-view clustering method.
With the advancement of information acquisition technologies, multimedia data such as text, audio, image, and video can often be acquired from various sources in real-world application scenarios. For example, in multimedia image retrieval tasks, color, texture, and edges can be used to describe images, while in video scene analysis tasks, cameras from different angles can provide additional information for analyzing the same scene. This type of data is referred to as multi-view data, giving rise to a series of multi-view learning algorithms, including cross-view domain adaptation, multi-view clustering, and multi-view anomaly detection. The acquisition of semantic information from data is an important research topic in multimedia data mining. Multi-view clustering analyzes the multi-view features of data in an unsupervised manner to capture the intrinsic cluster information, and it has gained increasing attention in recent years. Spectral clustering has become a popular clustering algorithm due to its solid mathematical framework and the ability to partition clusters of arbitrary shapes. Consequently, in recent years, an increasing number of multi-view clustering algorithms based on spectral clustering have been proposed and applied to analyze and process multimedia data. Most multi-view clustering algorithms based on spectral clustering typically involve the following two steps: first, constructing a shared similarity graph from the multi-view data, and then applying spectral clustering on the similarity graph to obtain the clustering results. Due to the heterogeneity of multimedia acquisition sources, multi-view data often exhibit features such as redundancy, correlation, and diversity. This poses a key challenge in how to effectively mine the information from multi-view data to construct a high-quality similarity graph for clustering, thereby improving the clustering performance of multi-view clustering algorithms. To address the challenge, Gao et al. combined subspace learning with spectral clustering to learn a shared clustering partition for multi-view data. Cao et al. enforced the differences between multiple subspace representations using the Hilbert-Schmidt criterion to explore the complementary information between views. Wang et al. introduced an exclusive regularization constraint to ensure sufficient differences among multiple subspace representations while obtaining a consistent clustering partition from multiple subspace representations. Nie et al. combined clustering and local structure learning to obtain a similarity graph with a Laplacian rank constraint. The above methods typically employ pairwise strategies to explore the differences and consistency information between views to improve clustering performance. In contrast, in recent years, some algorithms have achieved better clustering effects and gained increasing attention by stacking multiple representations into tensors and further exploring the high-order correlations of the data.
While previous multi-view clustering algorithms have improved clustering performance in various aspects, they often directly learn the similarity graph from the original features that contain noise and redundant information. As a result, the obtained similarity graph is not accurate, limiting the clustering performance.
To address the issue, the present application provides a Consensus Graph Learning-based Multi-View Clustering (CGLMVC) method that learns a consistent similarity graph for clustering from a new feature space.
Aiming at the existing defects in the prior art, the present application provides a consensus graph learning-based multi-view clustering method.
To achieve the above objective, the present application adopts the following technical solutions:
Provided is a consensus graph learning-based multi-view clustering method, including:
Further, obtaining spectral embedding representations in step S3 is expressed as:
maxH
wherein H(v) ∈n×c represents a spectral embedding matrix of the v-th view; A(v) represents a Laplacian matrix of the v-th view; n represents a number of data samples; c represents a number of clusters; Tr( )represents a trace of a matrix; H(v)T represents transpose of H(v); Ic represents a c×c identity matrix.
Further, using low-rank tensor representation learning to obtain a consistent distance matrix in step S4 is expressed as:
wherein ∈n×V×n represents a third-order tensor; ∈n×V×n represents a third-order tensor; V represents a number of views; ∥∥F represents a norm of a tensor; ∥∥w,* represents a weighted tensor nuclear norm; Φ( )represents a stacking of matrices into a tensor;
wherein λ represents a penalty parameter.
Further, step S6 specifically includes:
S61, fixing and unfolding tensors and into matrix form by discarding irrelevant terms, then the objective function being expressed as:
wherein T(v) represents the ν-th lateral slice of ;
S62, making P(v) ∈n×n represent a diagonal matrix, then diagonal elements being defined as:
wherein hi(v) and hj(v) represent the i-th and j-th rows of spectral embedding matrix H(v), respectively; hi(v)T represents a transpose of hj(v); solving {H(v)}v=1V;
S63, fixing {H(v)}v=1V and discarding other irrelevant terms, then the objective function being expressed as:
wherein (:;:; j) and
S64, solving (:;:; j) to obtain a solution of the objective function.
Further, constructing a consistent similarity graph in step S7 is expressed as:
wherein
Further, for solving (:;:; j) in step S64, the (:;:; j) has the following approximate solution:
(:;:; j)=(:;:; j)*(:;:; j)*(:;:; j)*
wherein (:;:; j)=(:;:; j)*(:;:; j)*(:;:; j)* represents a singular value decomposition of (:;:; j); (:;:; j) is defined as:
wherein c1(i, i, j)=ϵ, c2=(i, i, j)+ϵ2−4C. ϵ is a positive value small enough that the inequality
holds; C is a constraint parameter for setting the weight
Compared with the prior art, the present application provides a consensus graph learning-based multi-view clustering method and constructs a consistent similarity graph for clustering based on spectral embedding features. In this low-dimensional space, noise and redundant information are effectively filtered out, resulting in a similarity graph that well describes the cluster structure of the data.
The embodiments of the present application are illustrated below through specific examples, and other advantages and effects of the present application can be easily understood by those skilled in the art based on the contents disclosed herein. The present application can also be implemented or applied through other different specific embodiments. Various modifications or changes to the details described in the specification can be made based on different perspectives and applications without departing from the spirit of the present application. It should be noted that, unless conflicting, the embodiments and features of the embodiments may be combined with each other.
Aiming at the existing defects, the present application provides a consensus graph learning-based multi-view clustering method.
As shown in
S11, inputting an original data matrix to obtain a spectral embedding matrix;
S12, calculating a similarity graph matrix and a Laplacian matrix based on the spectral embedding matrix;
S13, applying spectral clustering to the calculated similarity graph matrix to obtain spectral embedding representations;
S14, stacking inner products of the normalized spectral embedding representations into a third-order tensor and using low-rank tensor representation learning to obtain a consistent distance matrix;
S15, integrating spectral embedding representation learning and low-rank tensor representation learning into a unified learning framework to obtain a objective function;
S16, solving the obtained objective function through an alternative iterative optimization strategy;
S17, constructing a consistent similarity graph based on the solved result;
S18, applying spectral clustering to the consistent similarity graph to obtain a clustering result.
The embodiment provides a consensus graph learning-based multi-view clustering (CGLMVC) method that learns a consistent similarity graph for clustering from a new feature space. Specifically, spectral embedding representations are firstly obtained from the similarity graphs of each view, and the inner products of multiple normalized spectral embedding representations are stacked into a third-order tensor. Then, high-order consistency information among multiple views is mined using the weighted tensor nuclear norm. The spectral embedding and low-rank tensor learning are further integrated into a unified learning framework to jointly learn spectral embedding and tensor representation. The embodiment takes into account the distribution differences of noise and redundancy across multiple views. By constraining the global consistency of multiple views, noise and redundant information can be effectively filtered out. Therefore, the learned spectral embedding representations are more suitable for constructing the intrinsic similarity graph of the data for clustering tasks. Based on the solved spectral embedding features, a consistent similarity graph can be constructed for clustering.
For real-world data, noise and redundant information inevitably mix in the original features. Therefore, the similarity graph learned from the original features is not accurate. To address the issue, an adaptive neighborhood graph is learned in a new low-dimensional feature space. The adaptive neighborhood graph can be obtained by solving the following problem:
wherein
Through the normalization operation, the k-dimensional spectral embedding representations corresponding to the samples are distributed on a unit hypersphere. Similarity graphs based on Euclidean distance can effectively capture the cluster structure of the data in this case.
In step S11, an original data matrix is input to obtain a spectral embedding matrix.
The original data matrix is {X(v)}v=1V, wherein X(v) ∈n×d
In step S12, a similarity graph matrix and a Laplacian matrix are calculated based on the spectral embedding matrix.
The spectral embedding matrix can be obtained by applying spectral clustering to the view-specific similarity graph W(v). The objective function is as follows:
maxH
wherein H(v)∈n×c represents a spectral embedding matrix of the v-th view; A(v) represents a Laplacian matrix of the v-th view; nl represents a number of data samples; c represents a number of clusters; Tr( )represents a trace of a matrix; H(v)T represents transpose of H(v); Ic represents a c×c identity matrix.
The S obtained in the above formula mainly depends on the distance matrix Dh(v), such as Dijh(v)=∥
In step S14, inner products of the normalized spectral embedding representations are stacked into a third-order tensor to obtain the low-rank tensor representation learning. {
wherein ∈n×V×n represents the third-order tensor, ∈n×V×n represents the third-order tensor, V represents the number of views,
In step S15, spectral embedding representation learning and low-rank tensor representation learning are integrated into a unified learning framework to obtain a objective function. The objective function of the consensus graph learning-based multi-view clustering method provided by the embodiment can be expressed as follows:
wherein λ represents a penalty parameter, τ represents the singular value threshold, and ∈n×V×n represents a third-order tensor. The first term of the formula is spectral embedding, which aims to obtain low-dimensional representations while preserving the local characteristics of the data. The second and third terms are used to mine the principal components of the tensor and constrain the consistency in the matrix {
σ()=***
wherein =ifft((−τ)+, []3), t30 =max(t, 0).
In the above formula, each singular value undergoes a shrinking operation using the same singular value threshold τ. However, relatively larger singular values quantify the information about the principal components and thus should undergo less shrinking operations. Excessive penalization of larger singular values hinders the mining of key information from the tensor. Therefore, in the embodiment, a weighted tensor nuclear norm is introduced to enhance the flexibility of the tensor nuclear norm. The weighted tensor nuclear norm is expressed as follows:
wherein wi(j) represents the singular value weights.
By replacing the nuclear norm in the above objective function with the weighted tensor nuclear norm, the final objective function is obtained.
The objective function of the consensus graph learning-based multi-view clustering method provided by the embodiment can be expressed as follows:
wherein λ, represents a penalty parameter.
By solving the objective function, the consistent similarity graph S can be obtained using the adaptive neighborhood graph learning method from the matrix {
In step S16, the obtained objective function is solved through an alternative iterative optimization strategy, which specifically includes:
S61, fixing the variable and unfolding tensors and into matrix form by discarding irrelevant terms, then the objective function being expressed as:
wherein T(v) represents the ν-th lateral slice of , such as T(v)=(:, v,:). The above formula can be further rewritten as follows:
S62, making P(v) ∈n×n represent a diagonal matrix, then diagonal elements being defined as:
wherein hi(v) and hj(v) represent the i-th and j-th rows of spectral embedding matrix H(v), respectively; hi(v)T represents a transpose of hj(v); thus, the following equation holds:
H
(v)
=P
(v)
H
(v)
By integrating the above formulas, the optimization problem can be further rewritten as follows:
maxH
wherein
and the optimal solution for H(v) can be obtained by selecting the eigenvectors corresponding to the c largest eigenvalues of the matrix G(v).
S63, fixing the variable {H(v)}v=1V and discarding other irrelevant terms, then the objective function being expressed as:
wherein for tensor χ∈Rn
thus, the above formula has the following equivalent formulation:
wherein (:,:, j) and (:,:, j) represent the j-th slice of and , respectively; and represent results of fast Fourier transform along the third dimension for and , respectively, such as =fft(, [], 3), =fft(, [], 3).
S64, solving (:,:, j) to obtain a solution of the objective function.
(:, :, j) has the following approximate solution:
(:, :, j)=(:,:, j)*(:,:, j)*(:, :, j)*
wherein (:, :, j)=(:, :, j)*(:, :, j)*(:, :, j) represents a singular value decomposition of (:, :,j); (:, :, j) is defined as:
wherein c1=(i, i, j)=ϵ, c2=((i,i,j)+ϵ)2−4C. ϵ is a positive value small enough that the inequality
holds; C is a constraint parameter for setting the weight wi(j), such as
In step S17, constructing a consistent similarity graph is expressed as:
wherein
The embodiment provides a consensus graph learning-based multi-view clustering method (CGLMVC). Compared to other multi-view clustering algorithms such as LT-MSC, MLAN, GMC, and SM2SC, the CGLMVC method constructs a consistent similarity graph for clustering based on spectral embedding features. In this low-dimensional space, noise and redundant information are effectively filtered out, resulting in a similarity graph that well describes the cluster structure of the data.
The difference between the consensus graph learning-based multi-view clustering method provided in this embodiment and that in Embodiment I is as follows:
To fully verify the effectiveness of the CGLMVC method of the present application, the performance of the CGLMVC method is first tested on six commonly used underlying databases (MSRCV1, ORL, 20newsgroups, 100leaves, COIL20, handwritten). A comparison is made with the following two single-view clustering algorithms and seven currently popular multi-view clustering algorithms:
(1) SC: spectral clustering algorithm.
(2) LRR: This method uses nuclear norm constraint to construct a low-rank subspace representation for clustering.
(3) MLAN: This method automatically assigns weights to each view and learns a similarity graph with Laplacian rank constraints for clustering.
(4) MCGC: This method reduces the differences between views using a collaborative regularization term and learns a similarity graph with Laplacian rank constraints for clustering from multiple spectral embedding matrices.
(5) GMC: This method integrates adaptive neighborhood graph learning and multiple similarity graph fusion into a unified framework to learn a similarity graph with Laplacian rank constraints for clustering.
(6) SM2SC: This method uses variable splitting and multiplicative decomposition strategies to mine the intrinsic structure of multiple views from view-specific subspace representations and constructs a structured similarity graph for clustering.
(7) LT-MSC: This method stacks multiple subspace representations into a tensor and learns a low-rank tensor subspace representation for clustering by constraining the three modes of the tensor to have a low rank.
(8) t-SVD-MS: This method stacks multiple subspace representations into a tensor and learns a low-rank tensor subspace representation for clustering by constraining the tensor to have a low rank using tensor nuclear norm based on tensor singular value decomposition.
(9) ETLMSC: This method stacks multiple probability transition matrices into a tensor and learns the intrinsic probability transition matrix using tensor nuclear norm and l2, 1 norm. Then, the final clustering results are obtained from the intrinsic probability transition matrix using spectral clustering based on Markov chain.
In the experiments, the CGLMVC method was compared with nine other clustering methods on six publicly available databases. The specific information about the six databases is as follows: MSRCV1: It contains a total of 210 images for scene recognition of seven categories. Each image is described using six different types of features, such as 256-dimensional LBP features, 100-dimensional HOG features, 512-dimensional GIST features, 48-dimensional Color Moment features, 1302-dimensional CENTRIST features, and 210-dimensional SIFT features. ORL: It contains a total of 400 face images of 40 individuals under different lighting conditions, times, and facial details. In the experiment, three different types of features, such as 4096-dimensional intensity features, 3304-dimensional LBP features, and 6750-dimensional Gabor features, are used to describe each face image.
20newsgroups: It is a document dataset that contains a total of 500 samples of five categories. In the experiment, three different document preprocessing techniques result in three different types of features.
100leaves: This dataset contains a total of 1600 plant images of 100 categories. In the experiment, three different types of features, including shape, texture, and edge from each image were extracted according to the embodiment.
COIL20: It contains a total of 1400 object images of 20 categories. For each image, 1024-dimensional intensity features, 3304-dimensional LBP features, and 6750-dimensional Gabor features were extracted according to the embodiment.
handwritten: It contains a total of 2000 handwritten digit images ranging from 0 to 9. For each image, 76-dimensional FOU features, 216-dimensional FAC features, 64-dimensional KAR features, 240-dimensional Pix features, 47-dimensional ZER features, and 6-dimensional MOR features were extracted according to the embodiment.
SC and LRR are two single-view clustering algorithms. According to the embodiment, the two single-view clustering algorithms were applied to each view of the data, and the best clustering results were obtained. For the SC algorithm, the number of nearest neighbors for the adaptive neighborhood similarity graph was set to 15. For the LRR algorithm, parameters were selected from the range of [10−3, 10−2, . . . 102,103] using a grid search strategy. For the MLAN and GMC algorithms, the number of nearest neighbors was set to 9 and 15, respectively, according to the settings in their respective papers. For the MCGC algorithm, the number of nearest neighbors was set to 15, and the regularization parameter was selected from [0.6:5:100]. For the SM2SC algorithm, the three hyperparameters were selected from [0.1, 0.15, 0.2, 0.3, 0.4, 0.5,1,10, 40,100] , [0.1, 0.5,1,1.5, 2], and [0.05, 0.1, 0.4,1, 5], respectively. For the LT-MSC algorithm, the hyperparameters were selected from [0:0.05:0.5,10:10:100]. For the t-SVD-MS and ETLMSC algorithms, their hyperparameters were selected from [0.1:0.1:2] and [10−4:10−4: 10−3: 10−3:10−3: 10−2, . . . , 101: 101: 102], respectively. For the CGLMVC algorithm provided in the embodiment, the nearest neighbor parameter is set to 15. λ and C were selected using a grid search strategy from the [1,5,10,50,100,500,1000,5000]. To ensure a fair comparison, each experiment is repeated 20 times, and the average results were given. In addition, seven indexes including accuracy (ACC), normalized mutual information (NMI), adjusted Rand index (ARI), F-score, Precision, Recall, and Purity were used to evaluate the clustering performance according to the embodiment. Higher values of these seven indexes indicate better surface clustering performance.
Table 1 shows the seven clustering index results for different methods on the six databases. The following conclusions can be drawn from the embodiment.
(1) The CGLMVC algorithm significantly outperforms the other comparative algorithms. Taking the MSRCV1 dataset as an example, the CGLMVC algorithm outperforms the second-best SM2SC algorithm by 5.24, 10.66, and 5.24 percentage points in terms of ACC, NMI, and Purity indexes, respectively. This validates the advantages and effectiveness of the method provided in the embodiment. The CGLMVC algorithm can achieve a better clustering effect for two main reasons. Firstly, the CGLMVC algorithm learns the similarity graph from the spectral embedding matrix instead of the original features. Secondly, simultaneous spectral embedding and low-rank tensor learning enables the obtaining of high-quality spectral embedding features.
(2) The CGLMVC algorithm outperforms the MCGC, MLAN, GMC, and ETLMSC algorithms, which are graph-based multi-view clustering algorithms. MLAN, GMC, and ETLMSC algorithms learn the similarity graph from the original features for clustering. However, the presence of noise and redundant information in the original features limits the ability of the learned similarity graph to reveal the intrinsic structure of the data, thereby restricting their clustering effect. MCGC algorithm mines the pairwise correlations between multiple views and learns a consistent similarity graph from spectral embeddings for clustering. Therefore, its clustering performance is also limited.
(3) The CGLMVC algorithm outperforms the LT-MSC, t-SVD-MS, and ETLMSC, three tensor-based multi-view clustering algorithms, on most datasets. This indicates that learning similarity graphs for clustering in the spectral embedding feature space yields better results compared to the original feature space.
(4) Compared to the LT-MSC, t-SVD-MS, and SM2SC, three subspace-based multi-view clustering algorithms, the CGLMVC algorithm achieves the best results on most datasets. The reason for the relatively good performance of LT-MSC and t-SVD-MS algorithms on the 20newsgroups dataset may be that the removal of outliers by norm l2,1 enables the subspace segmentation to effectively mine the cluster structure of the data.
(5) SC and LRR are two effective single-view clustering algorithms. Compared to other comparative methods, they often achieve feasible or even better clustering effects. However, the CGLMVC algorithm can achieve a better clustering effect on all datasets. This indicates the superiority of the CGLMVC algorithm.
To verify that the learned embedding features by the CGLMVC algorithm are more conducive than the original features to constructing intrinsic similarity graphs for clustering tasks, this embodiment obtains view-specific similarity graphs and average similarity graphs from both the original features and the learned embedding features, respectively. Then, spectral clustering is performed on these similarity graphs, and the clustering accuracy (ACC) index is recorded according to the embodiment. As shown in Table 2, with an increasing number of iterations, the learned embedding features by the CGLMVC algorithm of the embodiment can construct better similarity graphs, providing an improved clustering effect. This effectively validates the superiority of the CGLMVC algorithm.
The present application contains two parameters λ and C . In the experiments of the embodiment, the parameters λ and C were selected from the range [1, 5,10,50,100,500,1000,5000] using grid search.
In the optimization process of solving the objective function, the computational complexity mainly lies in updating variables {H(v)}v=1V and . For updating variable {H(v)}v=1V , it requires the complexity of (n2c) to compute, in each iteration, the eigenvectors corresponding to the c largest eigenvalues in a n×n matrix. For updating variable , it requires the complexity of (n2Vlog(n)) and (n2V2) to perform fast Fourier transform, inverse fast Fourier transform, and singular value decomposition on a nxV matrix, respectively. For computing the similarity graph and spectral clustering, it requires the computational complexity of (nlog((n)) and (n2c) , respectively. Therefore, the overall computational complexity of the CGLMVC algorithm is (tVn2c+tn2Vlog(n)+tn2V2+nlog(n)+n2c), wherein t represents the number of iterations.
To verify the convergence of the CGLMVC algorithm, this embodiment recorded the convergence curves of the objective function of the algorithm on six datasets. As shown in
It should be noted that the above is only the preferred embodiments of the present application and the principles of the employed technologies. It should be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, and those skilled in the art can make various obvious changes, rearrangements, and substitutions without departing from the protection scope of the present application. Therefore, although the above embodiments have provided a detailed description of the present application, the application is not limited to the above embodiments, and may further include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202110171227.2 | Feb 2021 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/135989 | 12/7/2021 | WO |