System and method for probabilistic relational clustering

Description

BACKGROUND OF THE INVENTION
1. Introduction

Most clustering approaches in the literature focus on “flat” data in which each data object is represented as a fixed-length attribute vector [38]. However, many real-world data sets are much richer in structure, involving objects of multiple types that are related to each other, such as documents and words in a text corpus, Web pages, search queries and Web users in a Web search system, and shops, customers, suppliers, shareholders and advertisement media in a marketing system.

First, the transformation causes the loss of relation and structure information [14]. Second, traditional clustering approaches are unable to tackle influence propagation in clustering relational data, i.e., the hidden patterns of different types of objects could affect each other both directly and indirectly (pass along relation chains). Third, in some data mining applications, users are not only interested in the hidden structure for each type of objects, but also interaction patterns involving multi-types of objects. For example, in document clustering, in addition to document clusters and word clusters, the relationship between document clusters and word clusters is also useful information. It is difficult to discover such interaction patterns by clustering each type of objects individually.

Moreover, a number of important clustering problems, which have been of intensive interest in the literature, can be viewed as special cases of relational clustering. For example, graph clustering (partitioning) [7, 42, 13, 6, 20, 28] can be viewed as clustering on singly-type relational data consisting of only homogeneous relations (represented as a graph affinity matrix); co-clustering [12, 2] which arises in important applications such as document clustering and micro-array data clustering, can be formulated as clustering on bi-type relational data consisting of only heterogeneous relations.

Recently, semi-supervised clustering [46, 4] has attracted significant attention, which is a special type of clustering using both labeled and unlabeled data. Therefore, relational data present not only huge challenges to traditional unsupervised clustering approaches, but also great need for theoretical unification of various clustering tasks.

2. Related Work

Clustering on a special case of relational data, bi-type relational data consisting of only heterogeneous relations, such as the word-document data, is called co-clustering or bi-clustering. Several previous efforts related to co-clustering are model based [22, 23]. Spectral graph partitioning has also been applied to bi-type relational data [11, 25]. These algorithms formulate the data matrix as a bipartite graph and seek to find the optimal normalized cut for the graph.

Due to the nature of a bipartite graph, these algorithms have the restriction that the clusters from different types of objects must have one-to-one associations. Information-theory based co-clustering has also attracted attention in the literature. [12] proposes a co-clustering algorithm to maximize the mutual information between the clustered random variables subject to the constraints on the number of row and column clusters. A more generalized co-clustering framework is presented by [2] wherein any Bregman divergence can be used in the objective function. Recently, co-clustering has been addressed based on matrix factorization. [35] proposes an EM-like algorithm based on multiplicative updating rules.

Graph clustering (partitioning) clusters homogeneous data objects based on pairwise similarities, which can be viewed as homogeneous relations. Graph partitioning has been studied for decades and a number of different approaches, such as spectral approaches [7, 42, 13] and multilevel approaches [6, 20, 28], have been proposed. Some efforts [17, 43, 21, 21, 1] based on stochastic block modeling also focus on homogeneous relations. Compared with co-clustering and homogeneous-relation-based clustering, clustering on general relational data, which may consist of more than two types of data objects with various structures, has not been well studied in the literature. Several noticeable efforts are discussed as follows. [45, 19] extend the probabilistic relational model to the clustering scenario by introducing latent variables into the model; these models focus on using attribute information for clustering. [18] formulates star-structured relational data as a star-structured m-partite graph and develops an algorithm based on semi-definite programming to partition the graph. [34] formulates multi-type relational data as K-partite graphs and proposes a family of algorithms to identify the hidden structures of a k-partite graph by constructing a relation summary network to approximate the original k-partite graph under a broad range of distortion measures.

The above graph-based algorithms do not consider attribute information. Some efforts on relational clustering are based on inductive logic programming [37, 24, 31]. Based on the idea of mutual reinforcement clustering, [51] proposes a framework for clustering heterogeneous Web objects and [47] presents an approach to improve the cluster quality of interrelated data objects through an iterative reinforcement clustering process. There are no sound objective function and theoretical proof on the effectiveness and correctness (convergence) of the mutual reinforcement clustering. Some efforts [26, 50, 49, 5] in the literature focus on how to measure the similarities or choosing cross-relational attributes.

To summarize, the research on relational data clustering has attracted substantial attention, especially in the special cases of relational data. However, there is still limited and preliminary work on general relational data clustering.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C shows examples of the structures of relational data;

FIG. 2 shows an NMI comparison of SGP, METIS and MMRC algorithms;

FIG. 3 shows an NMI comparison of BSGP, RSN and MMRC algorithms for bi-type data; and

FIG. 4 shows an NMI comparison of CBGC, RSN and MMRC algorithms for tri-type data.

SUMMARY OF THE INVENTION

Most clustering approaches in the literature focus on “flat” data in which each data object is represented as a fixed-length attribute vector. However, many real-world data sets are much richer in structure, involving objects of multiple types that are related to each other, such as documents and words in a text corpus, Web pages, search queries and Web users in a Web search system, and shops, customers, suppliers, share holders and advertisement media in a marketing system.

In general, relational data contain three types of information, attributes for individual objects, homogeneous relations between objects of the same type, heterogeneous relations between objects of different types. For example, for a scientific publication, relational data sets of papers and authors, the personal information such as affiliation for authors are attributes; the citation relations among papers are homogeneous relations; the authorship relations between papers and authors are heterogeneous relations. Such data violate the classic IID assumption in machine learning and statistics, and present significant challenges to traditional clustering approaches. An intuitive solution is that relational data is transformed into flat data and then each type of object clustered independently. However, this may not work well due to the following reasons. First, the transformation causes the loss of relation and structure information. Second, traditional clustering approaches are unable to tackle influence propagation in clustering relational data, i.e., the hidden patterns of different types of objects could affect each other both directly and indirectly (pass along relation chains). Third, in some data mining applications, users are not only interested in the hidden structure for each type of objects, but also interaction patterns involving multi-types of objects. For example, in document clustering, in addition to document clusters and word clusters, the relationship between document clusters and word clusters is also useful information. It is difficult to discover such interaction patterns by clustering each type of objects individually.

Moreover, a number of important clustering problems, which have been of intensive interest in the literature, can be viewed as special cases of relational clustering. For example, graph clustering (partitioning) can be viewed as clustering on singly-type relational data consisting of only homogeneous relations (represented as a graph affinity matrix); co-clustering which arises in important applications such as document clustering and micro-array data clustering, can be formulated as clustering on bi-type relational data consisting of only heterogeneous relations.

Recently, semi-supervised clustering has attracted significant attention, which is a special type of clustering using both labeled and unlabeled data. It can be formulated as clustering on single-type relational data consisting of attributes and homogeneous relations.

The present system and method is based on a probabilistic model for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. It is applicable to relational data of various structures. Under this model, parametric hard and soft (and hybrid) relational clustering algorithms are presented under a large number of exponential family distributions.

There are three main advantages: (1) the technique is applicable to various relational data from various applications; (2) It is capable of adapting different distribution assumptions for different relational data with different statistical properties; and (3) The resulting parameter matrices provide an intuitive summary for the hidden structure for the relational data.

3. Model Formulation

A probabilistic model is herein proposed for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The proposed model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. It is applicable to relational data of various structures. Under this model, parametric hard and soft relational clustering algorithms are provided under a large number of exponential family distributions. The algorithms are applicable to various relational data from various applications and at the same time unify a number of state-of-the-art clustering algorithms: co-clustering algorithms, the k-partite graph clustering, Bregman k-means, and semi-supervised clustering based on hidden Markov random fields.

With different compositions of three types of information, attributes, homogeneous relations and heterogeneous relations, relational data could have very different structures. FIGS. 1A-1C show three examples of the structures of relational data. FIG. 1A refers to a simple bi-type of relational data with only heterogeneous relations such as word-document data. FIG. 1B represents a bi-type data with all types of information, such as actor-movie data, in which actors (type 1) have attributes such as gender; actors are related to each other by collaboration in movies (homogeneous relations); actors are related to movies (type 2) by taking roles in movies (heterogeneous relations). FIG. 1C represents the data consisting of companies, customers, suppliers, share-holders and advertisement media, in which customers (type 5) have attributes.

A relational data set as a set of matrices is represented. Assume that a relational data set has m different types of data objects, χ⁽¹⁾={x_i⁽¹⁾}_i=1ⁿ¹, . . . , χ^(m)={x_i^(m)}_i=1ⁿ^m, where n_jdenotes the number of objects of the jth type and x_p^(j)denotes the name of the p^thobject of the j^thtype. The observations of the relational data are represented as three sets of matrices, attribute matrices {F^(j)ϵ custom character ^d^j^×n^j}_j=1^(m)where d_jdenotes the dimension of attributes for the j^thtype objects and F_·p^(j)denotes the attribute vector for object x_p⁽ⁱ⁾; homogeneous relation matrices {S^(j)ϵ^j^j^×nⁿ}_j=1^m, where S_pq^(j)denotes the relation between x_p^(j)and x_q^(j); heterogeneous relation matrices {R^(ij)ϵ custom character ⁿⁱ^×n^j}_{i, j=1}^m; where R_pq^(j)denotes the relation between x_p⁽ⁱ⁾and x_q^(j). The above representation is a general formulation. In real applications, not every type of object has attributes, homogeneous relations and heterogeneous relations. For example, the relational data set in FIG. 1A is represented by only one heterogeneous matrix R⁽¹²⁾, and the one in FIG. 1B is represented by three matrices, F⁽¹⁾, S⁽¹⁾and R⁽¹²⁾. Moreover, for a specific clustering task, not use all available attributes and relations are used after feature or relation selection pre-processing.

Mixed membership models, which assume that each object has mixed membership denoting its association with classes, have been widely used in the applications involving soft classification [16], such as matching words and pictures [39], race genetic structures [39, 48], and classifying scientific publications [15].

A relational mixed membership model is provided to cluster relational data (referred to as mixed membership relational clustering or MMRC).

Assume that each type of objects X^(j)has k_jlatent classes. The membership vectors for all the objects in X^(j)are represented as a membership matrix Λ^(j)ϵ[0,1]^k^j^×n^jisuch that the sum of elements of each column Λ_·p^(j)is 1 and Λ_·p^(j)denotes the membership vector for object x_p^(j), i.e., Λ_gp^(j)denotes the probability that object x_p⁽ⁱ⁾associates with the g^thlatent class. The parameters of distributions to generate attributes, homogeneous relations and heterogeneous relations are expressed in matrix forms. Let Θ^(j)ϵ custom character ^d^j^×k^jdenote the distribution parameter matrix for generating attributes F^(j)such that Θ_g^(j)denotes the parameter vector associated with the g^thlatent class. Similarly, Γ^(j)ϵ^k^j^×k^jdenotes the parameter matrix for generating homogeneous relations S^(j); Υ^(ij)ϵ^kⁱ^×k^jdenotes the parameter matrix for generating heterogeneous relations R^(ij). In summary, the parameters of MMRC model are Ω={{Λ^(j)}_j=1^m, {Θ^(j)}_j=1^m, {Υ^(ij)}_{i, j=1}^m}.

In general, the meanings of the parameters, Θ, Λ, and Υ, depend on the specific distribution assumptions. However, in Section 4.1, it is shown that for a large number of exponential family distributions, these parameters can be formulated as expectations with intuitive interpretations.

Next, the latent variables are introduced into the model. For each object X_p^j, a latent cluster indicator vector is generated based on its membership parameter Λ_·p^(j), which is denoted as C_·p^(p), i.e., C^(j)ϵ{0,1}^k^j^×n^jis a latent indicator matrix for all the j^thtype objects in X^(j).

Finally, we present the generative process of observations, {F^(j)}_j=1^m{S^(j)}_j=1^m, and {R^(ij)}_{i, j=1}^mor as follows:

1. For each object x_p^(j)
Sample C_·p^(j)˜multinomial(Λ_·p^(j),1).

2. For each object x_p^(j)
Sample F_·p^(j)˜Pr(F_·p^(j)|Θ^(j)C_·p^(j)).

3. For each pair of objects x_p^(j)and x_q^(j)
Sample S_pq^(j)˜Pr(S_pq^(j)|C_·p^(j))^TΓ^(j)C_·q^(j)).

4. For each pair of objects x_p^(j)and x_q^(j)
Sample R_pq^(ij)˜Pr(R_pq^(ij)|C_·p⁽ⁱ⁾)^TΓ^(ij)C_·q^(j)).

In the above generative process, a latent indicator vector for each object is generated based on multinomial distribution with the membership vector as parameters. Observations are generated independently conditioning on latent indicator variables. The parameters of condition distributions are formulated as products of the parameter matrices and latent indicators, i.e.,

Pr(F_·p^(j)|C_·p^(j),Θ^(j)=Pr(F_·p^(j)|Θ^(j)C_·p^(j),
Pr(S_pq^(j)|C_·p^(j),C_·p^(j)=Pr(S_pq^(j)|(C_·p^(j))^TΓ^(j)C_·q^(j)and
Pr(R_pq^(ij)|C_·p^(j),C_·p^(j),Υ^(j)=Pr(R_pq^(ij)|(C_·p⁽ⁱ⁾)^TΥ^(ij)C_·q^(j).

Under this formulation, an observation is sampled from the distributions of its associated latent classes. For example, if C_·p⁽ⁱ⁾indicates that x_p⁽ⁱ⁾is with the g^thlatent class and C_·q^(j)indicates that x_q^(j)is with the h^thlatent class, then (C_·p⁽ⁱ⁾)^TΥ^(ij)C_·q^(j)=Υ_gh^(ij). Hence, Pr(R_pq^(ij)|Υ_gh^(ij)implies that the relation between x_p⁽ⁱ⁾and x_q^(j)is sampled by using the parameter Υ_gh^(ij).

With matrix representation, the joint probability distribution over the observations and the latent variables can be formulated as follows,

$\begin{matrix} \Pr (Ψ ❘ Ω) = \prod_{j = 1}^{m} \Pr (C^{(j)} ❘ Λ^{(j)}) \prod_{j = 1}^{m} \Pr (F^{(j)} ❘ Θ^{(j)} C^{(j)}) \prod_{j = 1}^{m} {\Pr (S^{(j)} ❘ C^{(j)})}^{T} Γ^{(j)} C^{(j)}) \prod_{i = 1}^{m} \prod_{j = 1}^{m} {\Pr (R^{(ij)} ❘ C^{(i)})}^{T} Υ^{(ij)} C^{(j)}) where Ψ = {{C^{(j)}}_{j = 1}^{m}, {F^{(j)}}_{j = 1}^{m}, {R^{(ij)}}_{i, j = 1}^{m}}, \Pr (C^{(j)} ❘ Λ^{(j)}) = \prod_{p = 1}^{n_{j}} multinomial (Λ_{\cdot p}^{(j)}), \Pr (F^{(j)} ❘ Θ^{(j)} C^{(j)}) = \prod_{p = 1}^{n_{j}} \Pr (F_{\cdot p}^{(j)} ❘ Θ^{(j)} C_{\cdot p}^{(j)}), \Pr {(S^{(j)} ❘ C^{(j)})}^{T} Γ^{(j)} C^{(j)} = \prod_{p, q = 1}^{n_{j}} {\Pr (S_{pq}^{(j)} ❘ C_{\cdot p}^{(j)})}^{T} Γ^{(j)} C_{\cdot p}^{(j)}, & (1) \end{matrix}$

and similarly for R^(ij).

4. Algorithm Derivation

In this section, the parametric soft and hard relational clustering algorithms based on the MMRC model are derived under a large number of exponential family distributions.

4.1 MMRC with Exponential Families

To avoid clutter, instead of general relational data, relational data similar to the one in

FIG. 1(b) may be employed, which is a representative relational data set containing all three types of information for relational data, attributes, homogeneous relations and heterogeneous relations. However, the derivation and algorithms are applicable to general relational data. For the relational data set in FIG. 1(b), there are two types of objects, one attribute matrix F, one homogeneous relation matrix S and one heterogeneous relation matrix R. Based on Eq. (1), we have the following likelihood function,

L(Ω|Ψ)=Pr(C⁽¹⁾|Λ⁽¹⁾)Pr(C⁽²⁾|Λ⁽²⁾)Pr(F|ΘC⁽¹⁾)Pr(S|C⁽¹⁾)^TΓC⁽¹⁾)Pr(R|C⁽¹⁾)^TΥC⁽²⁾ (2)

One goal is to maximize the likelihood function in Eq. (2) to estimate unknown parameters.

For the likelihood function in Eq. (2), the specific forms of condition distributions for attributes and relations depend on specific applications. Presumably, for a specific likelihood function, a specific algorithm should be derived. However, a large number of useful distributions, such as normal distribution, Poisson distribution, and Bernoulli distributions, belong to exponential families and the distribution functions of exponential families can be formulated as a general form. This advantageous property facilitates derivation of a general EM algorithm for the MMRC model.

It is shown in the literature [3, 9] that there exists bijection between exponential families and Bregman divergences [40]. For example, the normal distribution, Bernoulli distribution, multinomial distribution and exponential distribution correspond to Euclidean distance, logistic loss, KL-divergence and Itakura-Satio distance, respectively. Based on the bijection, an exponential family density Pr(x) can always be formulated as the following expression with a Bregman divergence D_ϕ,

Pr(x)=exp(−D_ϕ(x,μ))ƒ_ϕ(x), (3)

where ƒ_ϕ(x) is a uniquely determined function for each exponential probability density, and μ is the expectation parameter. Therefore, for the MMRC model under exponential family distributions:

Pr(F|ΘC⁽¹⁾)=exp(−D_ϕ1(F|ΘC⁽¹⁾))ƒ_ϕ1(F) (4)
Pr((S|(C⁽¹⁾)^TΓC⁽¹⁾)=exp(−D_ϕ2(S|(C⁽¹⁾)^TΓC⁽¹⁾))ƒ_ϕ2(S) (5)
Pr(R|(C⁽¹⁾)^TΥC⁽²⁾)=exp(−D_ϕ3(R,(C⁽¹⁾)^TΥC⁽²⁾)ƒ_ϕ3(R) (6)

In the above equations, a Bregman divergence of two matrices is defined as the sum of the Bregman divergence of each pair of elements from the two matrices. Another advantage of the above formulation is that under this formulation, the parameters, Θ, Λ, and Υ, are expectations of intuitive interpretations. Θ consists of center vectors of attributes; Γ provides an intuitive summary of cluster structure within the same type objects, since Γ_gh⁽¹⁾implies expectation relations between the g^thcluster and the h^thcluster of type 1 objects; similarly, Υ provides an intuitive summary for cluster structures between the different type objects. In the above formulation, different Bregman divergences are used, D_ϕ1, D_ϕ2, and D_ϕ3, for the attributes, homogeneous relations and heterogeneous relations, since they could have different distributions in real applications. For example, suppose

$Θ^{(1)} = [\begin{matrix} 1.1 & 2.3 \\ 1.5 & 2.5 \end{matrix}]$

for normal distribution,

$Γ^{(12)} = [\begin{matrix} 0.9 & 0.1 \\ 0.1 & 0.7 \end{matrix}]$

for Bernoulli distribution, and

$Υ^{(12)} = [\begin{matrix} 1 & 3 \\ 3 & 1 \end{matrix}]$

for Poisson distribution; then the cluster structures of the data are very intuitive. First, the center attribute vectors for the two clusters of type 1 are

$[\begin{matrix} 1.1 \\ 1.5 \end{matrix}] and [\begin{matrix} 2.3 \\ 2.5 \end{matrix}];$

second, by Γ⁽¹⁾we know that the type 1 nodes from different clusters are barely related and cluster 1 is denser that cluster 2; third, by Υ⁽¹²⁾we know that cluster 1 of type 1 nodes are related to cluster 2 of type 2 nodes more strongly than to cluster 1 of type 2, and so on so forth.

Since the distributions of C⁽¹⁾and C⁽²⁾are modeled as multinomial distributions:

$\begin{matrix} \Pr (C^{(1)} ❘ Λ^{(1)}) = \prod_{p = 1}^{n_{1}} \prod_{g = 1}^{k_{1}} {(Λ_{gp}^{(1)})}^{C_{gp}^{(1)}}, & (7) \\ \Pr (C^{(2)} ❘ Λ^{(2)}) = \prod_{q = 1}^{n_{2}} \prod_{h = 1}^{k_{2}} {(Λ_{gp}^{(2)})}^{C_{gp}^{(2)}} & (8) \end{matrix}$

Substituting Eqs. (4), (5), (6), (7), and (8) into Eq, (2) and taking some algebraic manipulations, the following log-likelihood function is obtained for MMRC under exponential families,

$\begin{matrix} \log L (Ω ❘ Ψ) = \sum_{p = 1}^{n_{1}} \sum_{g = 1}^{k_{1}} C_{gp}^{(1)} \log Λ_{gp}^{(1)} + \sum_{q = 1}^{n_{2}} \sum_{h = 1}^{k_{2}} C_{hq}^{(2)} \log Λ_{gp}^{(2)} - D_{ϕ_{1}} (F, Θ C^{(1)}) - D_{ϕ_{2}} (S, {(C^{(1)})}^{T} Γ C^{(1)}) - {D_{ϕ_{3}} (R, C^{(1)})}^{T} Υ C^{(2)} + τ & (9) \end{matrix}$

where τ=log ƒ_ϕ1(F)+log ƒ_ϕ2(S)+log ƒ_ϕ3(R), which is a constant in the log-likelihood function.

Expectation Maximization (EM) is a general approach to find the maximum-likelihood estimate of the parameters when the model has latent variables. EM does maximum likelihood estimation by iteratively maximizing the expectation of the complete (log-)likelihood, which is the following under the MMRC model,

Q(Ω,{tilde over (Ω)})=E[ log(L(Ω|Ψ))|C⁽¹⁾,C⁽²⁾,{tilde over (Ω)}] (10)

where {tilde over (Ω)} denotes the current estimation of the parameters and Ω is the new parameters that we optimize to increase Q. Two steps, E-step (expectation step) and M-step (minimization step), are alternatively performed to maximize the objective function in Eq. (10).

4.2 Monte Carlo E-Step

In the E-step, based on Bayes' rule, the posterior probability of the latent variables,

$\begin{matrix} \Pr (C^{(1)}, C^{(2)} ❘ F, S, R, \tilde{Ω}) = \frac{\Pr (C^{(1)}, C^{(2)}, F, S, R ❘ \tilde{Ω})}{Σ_{C^{(1)}, C^{(2)}} \Pr (C^{(1)}, C^{(2)}, F, S, R ❘ \tilde{Ω})} & (11) \end{matrix}$

is updated using the current estimation of the parameters. However, conditioning on observations, the latent variables are not independent, i.e., there exist dependencies between the posterior probabilities of C⁽¹⁾and C⁽²⁾, and between those of C_·p⁽¹⁾and C_·q⁽¹⁾. Hence, directly computing the posterior based on Eq. (11) is prohibitively expensive.

There exist several techniques for computing intractable posterior, such as Monte Carlo approaches, belief propagation, and variational methods. A Monte Carlo approach, Gibbs sampler, is further analyzed, which is a method of constructing a Markov chain whose stationary distribution is the distribution to be estimated. It is of course understood that other known techniques may be employed.

It is relatively easy to compute the posterior of a latent indicator vector while fixing all other latent indicator vectors, i.e.,

$\begin{matrix} \Pr (C_{\cdot p}^{(1)}, C_{\cdot - p}^{(1)}, C^{(2)}, F, S, R, \tilde{Ω}) = \frac{\Pr (C^{(1)}, C^{(2)}, F, S, R ❘ \tilde{Ω})}{Σ_{C_{\cdot p}^{(1)}} \Pr (C^{(1)}, C^{(2)}, F, S, R ❘ \tilde{Ω})} & (12) \end{matrix}$

where C_·−p⁽¹⁾denotes all the latent indicator vectors except for C_·p⁽¹⁾. Therefore, the following Markov chain is presented to estimate the posterior in Eq. (11).

Sample C_·1⁽¹⁾from distribution Pr(C_·1⁽¹⁾|C_·−1⁽¹⁾,C⁽²⁾,F,S,R,{tilde over (Ω)})
. . .
Sample C_·n⁽¹⁾from distribution Pr(C_·n₁⁽¹⁾|C_·−n₁⁽¹⁾,C⁽²⁾,F,S,R,{tilde over (Ω)})
Sample C_·1⁽²⁾from distribution Pr(C_·1⁽²⁾|C_·−1⁽²⁾,C⁽¹⁾,F,S,R,{tilde over (Ω)})
. . .
Sample C_·n₂⁽²⁾from distribution Pr(C_·n₂⁽²⁾|C_·−n₂⁽²⁾,C⁽¹⁾,F,S,R,{tilde over (Ω)};

Note that at each sampling step in the above procedure, the latent indicator variables sampled from previous steps are used. The above procedure iterates until the stop criterion is satisfied. It can be shown that the above procedure is a Markov chain converging to Pr(C⁽¹⁾,C⁽²⁾|F,S,R,{tilde over (Ω)}. Assume that we keep l samples for estimation; then the posterior can be obtained simply by the empirical joint distribution of C⁽¹⁾and C⁽²⁾in the l samples.

4.3 M-Step

After the E-step, the posterior probability of latent variables is available to evaluate the expectation of the complete log-likelihood,

$\begin{matrix} Q (Ω, \tilde{Ω}) = \sum_{C^{(1)}, C^{(2)}}^{} \log (L (Ω ❘ Ψ)) \Pr (C^{(1)}, C^{(2)} ❘ F, S, R, \tilde{Ω}) & (13) \end{matrix}$

In the M-step, the unknown parameters are optimized by)

$\begin{matrix} Ω^{*} = \arg \max_{Ω} Q (Ω, \tilde{Ω}) & (14) \end{matrix}$

First, the update rules for membership parameters Λ⁽¹⁾and Λ⁽²⁾are derived. To derive the expression for each Λ_hp⁽¹⁾, the Lagrange multiplier a is introduced with the constraint Σ_g=1^k¹Λ⁽¹⁾=1, and the following equation solved,

$\begin{matrix} \frac{\partial}{\partial Λ_{hp}^{(1)}} {Q (Ω, \tilde{Ω}) + α (\sum_{g = 1}^{κ_{1}} Λ_{gp}^{(1)} - 1)} . & (15) \end{matrix}$

Substituting Eqs. (9) and (13) into Eq. (15), after some algebraic manipulations:

Pr(C_hp⁽¹⁾=1|F,S,R,Ω)−αΛ_hp⁽¹⁾=0 (16)

Summing both sides over h, α=1 is obtained, resulting in the following update rule,

Λ_hp⁽¹⁾=Pr(C_hp⁽¹⁾=1|F,S,R,{tilde over (Ω)}), (17)

i.e., Λ_hp⁽¹⁾is updated as the posterior probability that the p^thobject is associated with the h^thcluster. Similarly, the following update rule for Λ_hp⁽²⁾is provided:

Λ_hp⁽²⁾=Pr(C_hp⁽²⁾=1|F,S,R,{tilde over (Ω)}) (18)

Second, the update rule for Θ is derived. Based on Eqs. (9) and (13), optimizing Θ is equivalent to the following optimization,

$\begin{matrix} \arg \min_{Θ} \sum_{C^{(1)}, C^{(2)}}^{} D_{ϕ_{1}} (F, Θ C^{(1)}) \Pr (C^{(1)} C^{(2)} ❘ F, S, R, \tilde{Ω}) & (19) \end{matrix}$

The above expression may be reformulated as,

$\begin{matrix} \arg \min_{Θ} \sum_{C^{(1)}}^{} \sum_{g = 1}^{k_{1}} \sum_{p : C_{gp}^{(1)} = 1}^{} D_{ϕ_{1}} (F_{\cdot p}, Θ_{\cdot g}) \Pr (C_{gp}^{(1)} = 1 ❘ F, S, R, \tilde{Ω}) . & (20) \end{matrix}$

To solve the above optimization, an important property of Bregman divergence presented in the following theorem may be used.

Theorem 1.

Let X be a random variable taking values in χ={x_i}_i=1ⁿ⊂S⊂ custom character ^dfollowing v. Given a Bregman divergence D_ϕ:S×int(S)→[0,∞), the problem

$\begin{matrix} \min_{s \in S} E_{v} [D_{ϕ} (X, s)] & (21) \end{matrix}$

has a unique minimizer given by s*=E_v[X]|

The proof of Theorem 1 is omitted (please refer [3, 40]). Theorem 1 states that the Bregman representative of a random variable is always the expectation of the variable. Based on Theorem 1 and the objective function in (20), we update Θ_·gas follows,

$\begin{matrix} Θ_{\cdot g} = \frac{\sum_{p = 1}^{n_{1}} F_{\cdot p} \Pr (C_{gp}^{(1)} = 1 | F, S, R, \tilde{Ω})}{\sum_{p = 1}^{n_{1}} \Pr (C_{gp}^{(1)} = 1 | F, S, R, \tilde{Ω})} & (22) \end{matrix}$

Third, the update rule for Γ is derived. Based on Eqs. (9) and (13), optimizing Γ is formulated as the following optimization,

$\begin{matrix} \arg \min_{Γ} \sum_{C^{(1)}} \sum_{g = 1}^{k_{1}} \sum_{h = 1}^{k_{1}} \sum_{\underset{q : C_{hq}^{(1)} = 1}{p : C_{gp}^{(1)} = 1}} D_{ϕ_{2}} (S_{pq}, Γ_{gh}) \tilde{p} & (23) \end{matrix}$

where {tilde over (p)} denotes Pr(C_gp⁽¹⁾=1, C_hq⁽¹⁾=1|F,S,R,{tilde over (Ω)}) and 1≤p; q≤n₁. Based on Theorem 1, we update each Γ_ghas follows,

$\begin{matrix} Γ_{gh} = \frac{\sum_{p, q = 1}^{n_{1}} S_{pq} \Pr (C_{gp}^{(1)} = 1, C_{hq}^{(1)} = 1 | F, S, R, \tilde{Ω})}{\sum_{p, q = 1}^{n_{1}} \Pr (C_{gp}^{(1)} = 1, C_{hq}^{(1)} = 1 | F, S, R, \tilde{Ω})} & (24) \end{matrix}$

Fourth, the update rule for Υ is derived. Based on Eqs. (9) and (13), optimizing Υ is formulated as the following optimization,

$\begin{matrix} \arg \min_{Υ} \sum_{C^{(1)}, C^{(2)}} \sum_{g = 1}^{k_{1}} \sum_{h = 1}^{k_{2}} \sum_{\underset{q : C_{hq}^{(2)} = 1}{p : C_{gp}^{(1)} = 1}} D_{ϕ_{3}} (R_{pq}, Υ_{gh}) \tilde{p}, & (25) \end{matrix}$

where {tilde over (p)} denotes Pr(C_gp⁽¹⁾=1, C_hq⁽²⁾=1|F,S,R,{tilde over (Ω)}), 1≤p≤n₁and 1≤q≤n₂. Based on Theorem 1, each Γ_ghis updated as follows,

$\begin{matrix} Υ_{gh} = \frac{\sum_{p = 1}^{n_{1}} \sum_{q = 1}^{n_{2}} R_{pq} \Pr (C_{gp}^{(1)} = 1, C_{hq}^{(2)} = 1 | F, S, R, \tilde{Ω})}{\sum_{p = 1}^{n_{1}} \sum_{q = 1}^{n_{2}} \Pr (C_{gp}^{(1)} = 1, C_{hq}^{(2)} = 1 | F, S, R, \tilde{Ω})} & (26) \end{matrix}$

Combining the E-step and M-step, a general relational clustering algorithm is provided, Exponential Family MMRC (EF-MMRC) algorithm, which is summarized in Algorithm 1. Since it is straightforward to apply the algorithm derivation to a relational data set of any structure, Algorithm 1 is proposed based on the input of a general relational data set. Despite that the input relational data could have various structures, EF-MMRC works simply as follows: in the E-step, EF-MMRC iteratively updates the posterior probabilities that an object is associated with the clusters (the Markov chain in Section 4.2); in the M-step, based on the current cluster association (posterior probabilities), the cluster representatives of attributes and relations are updated as the weighted mean of the observations no matter which exponential distributions are assumed.

Therefore, with the simplicity of the traditional centroid-based clustering algorithms, EF-MMRC is capable of making use of all attribute information and homogeneous and heterogeneous relation information to learn hidden structures from various relational data. Since EF-MMRC simultaneously clusters multi-type interrelated objects, the cluster structures of different types of objects may interact with each other directly or indirectly during the clustering process to automatically deal with the influence propagation. Besides the local cluster structures for each type of objects, the output of EF-MMRC also provides the summary of the global hidden structure for the data, i.e., based on Γ and Υ, we know how the clusters of the same type and different types are related to each other. Furthermore, relational data from different applications may have different probabilistic distributions on the attributes and relations; it is easy for EF-MMRC to adapt to this situation by simply using different Bregman divergences corresponding to different exponential family distributions.

Algorithm 1 Exponential Family MMRC Algorithm

Input: A relational data set {{F^(j)}_j=1^m, {S_(j)}_j=1^m, {R^(ij)}_{i, j=1}^m},

a set of exponential family distributions (Bregman divergences) assumed for the data set.

Output: Membership Matrices {Λ^(j)}_j=1^m,

attribute expectation matrices {Θ^(j)}_j=1^m,

homogeneous relation expectation matrices {Γ^(j)}_j=1^mand

heterogeneous relation expectation matrices {Υ^(ij}_{i, j=1}^m.

Method:

1: Initialize the parameters as {tilde over (Ω)}={{{tilde over (Λ)}^(j)}_j=1^m, {{tilde over (Θ)}^(j)}_j=1^m}, {{tilde over (Γ)}^(j)}_j=1^m, {{tilde over (Υ)}^(ij)}_{i, j=1}^m.

2: repeat

3: {E-step}

4: Compute the posterior Pr({C^(j)}|F^(j)}_j=1^m, {S^(j)}_j=1^m, {R^(ij)}_{i, j=1}^m, {tilde over (Ω)} using the Gibbs sampler.

5: {M-step}

6: for j=1 to m do

7: Compute Λ^(j)using update rule (17).

8: Compute Θ^(j)using update rule (22).

9: Compute Γ^(j)using update rule (24).

10: for i=1 to m do

11: Compute Υ^(ij)using update rule (26).

12: end for

13: end for

14: {tilde over (Ω)}=Ω

15: until convergence

If we assume O(m) types of heterogeneous relations among m types of objects, which is typical in real applications, and let n=Θ(n_i) and k=Θ(k_i), the computational complexity of EF-MMRC can be shown to be O(tmn²k) for t iterations. If the k-means algorithm are applied to each type of nodes individually by transforming the relations into attributes for each type of nodes, the total computational complexity is also O(tmn²k).

4.4 Hard MMRC Algorithm

Due to its simplicity, scalability, and broad applicability, k-means algorithm has become one of the most popular clustering algorithms. Hence, it is desirable to extend k-means to relational data. Some efforts [47, 2, 12, 33] in the literature work in this direction. However, these approaches apply to only some special and simple cases of relational data, such as bi-type heterogeneous relational data.

As traditional k-means can be formulated as a hard version of Gaussian mixture model EM algorithm [29], the hard version of MMRC algorithm is presented as a general relational k-means algorithm (Algorithm 1 is herein referred to as “soft EF-MMRC”), which applies to various relational data.

To derive the hard version MMRC algorithm, soft membership parameters Λ^(j)are omitted in the MMRC model (C^(j)in the model provides the hard membership for each object). Next, the computation of the posterior probabilities in the E-step is changed to a reassignment procedure, i.e., in the E-step, based on the estimation of the current parameters, cluster labels, {C^(j)}_j=1^m, are reassigned to maximize the objective function in (9). In particular, for each object, while fixing the cluster assignments of all other objects, each cluster is assigned to find the optimal cluster assignment maximizing the objective function in (9), which is equivalent to minimizing the Bregman distances between the observations and the corresponding expectation parameters. After all objects are assigned, the re-assignment process is repeated until no object changes its cluster assignment between two successive iterations.

In the M-step, the parameters are estimated based on the cluster assignments from the E-step. A simple way to derive the update rules is to follow the derivation in Section 4.3 but replace the posterior probabilities by its hard versions. For example, after the E-step, if the object x_j^(p)is assigned to the g^thcluster, i.e., C_gp^(j)=1, then the posterior Pr(C_gp⁽¹⁾)=1|F,S,R,{tilde over (Ω)}=1 and Pr(C_hp⁽¹⁾)=1|F,S,R,{tilde over (Ω)}=0 for h≠g.

Using the hard versions of the posterior probabilities, the following update rule is derived:

$\begin{matrix} Θ_{\cdot g}^{(j)} = \frac{\sum_{p : C_{gp}^{(j)} = 1} F_{\cdot p}^{(j)}}{\sum_{p = 1}^{n_{j}} C_{gp}^{(j)}} . & (27) \end{matrix}$

In the above update rule, since Σ_p=1ⁿⁱC_gp^(j)is the size of the g^thcluster, Θ_·g^(j)is actually updated as the mean of the attribute vectors of the objects assigned to the g^thcluster. Similarly, the following update rule:

$\begin{matrix} Γ_{gh}^{(j)} = \frac{\sum_{p : C_{gp}^{(j)} = 1, q : C_{hq}^{(j)} = 1} S_{pq}^{(j)}}{\sum_{p = 1}^{n_{j}} C_{gp}^{(j)} \sum_{q = 1}^{n_{j}} C_{hq}^{(j)}} & (28) \end{matrix}$

i.e., Γ_gh^(j)is updated as the mean of the relations between the objects of the j^thtype from the g^thcluster and from the h^thcluster.

Each heterogeneous relation expectation parameter Υ_gh^(ij)is updated as the mean of the objects of the i^thtype from the g^thcluster and of the j^thtype from the h^thcluster,

$\begin{matrix} Υ_{gh}^{(ij)} = \frac{\sum_{p : C_{gp}^{(j)} = 1, q : C_{hq}^{(j)} = 1} R_{pq}^{(ij)}}{\sum_{p = 1}^{n_{i}} C_{gp}^{(i)} \sum_{q = 1}^{n_{j}} C_{hq}^{(j)}} & (29) \end{matrix}$

The hard version of EF-MMRC algorithm is summarized in Algorithm 2. It works simply as the classic k-means. However, it is applicable to various relational data under various Bregman distance functions corresponding to various assumptions of probability distributions. Based on the EM framework, its convergence is guaranteed. When applied to some special cases of relational data, it provides simple and new algorithms for some important data mining problems. For example, when applied to the data of one homogeneous relation matrix representing a graph affinity matrix, it provides a simple and new graph partitioning algorithm.

Based on Algorithms 1 and 2, there is another version of EF-MMRC, i.e., soft and hard EF-MMRC may be combined together to have mixed EF-MMRC. For example, hard EF-MMRC may be run several times as initialization, then soft EF-MMRC run.

Algorithm 2 Hard MMRC Algorithm

Input: A relational data set {{F^(j)}_j=1^m, {S^(j)}_j=1^m, {R^(j)}_{i, j=1}^m}.

a set of exponential family distributions (Bregman divergences) assumed for the data set.

Output: Cluster indicator matrices {C^(j)}_j=1^m,

attribute expectation matrices {Θ^(j)}_j=1^m,

homogeneous relation expectation matrices {Γ^(j)}_j=1^m, and

heterogeneous relation expectation matrices {Υ^(ij)}_{i, j=1}^m.

Method:

1: Initialize the parameters as {tilde over (Ω)}={{{tilde over (Λ)}^(j)}_j=1^m, {{tilde over (Θ)}^(j)}_j=1^m, {{tilde over (Γ)}^(j)}_j=1^m, {{tilde over (Υ)}^(ij)}_{i, j=1}^m.

2: repeat

3: {E-step}

4: Based on the current parameters, reassign cluster labels for each objects, i.e., update {C^(j)}_j=1^m, to maximize the objective function in Eq. (9).

5: {M-step}

6: for j=1 to m do

7: Compute Θ^(j)using update rule (27).

8: Compute Γ^(j)using update rule (28).

9: for i=1 to m do

10: Compute Υ^(ij)using update rule (29).

11: end for

12: end for

13: {tilde over (Ω)}=Ω

14: until convergence

5. A Unified View to Clustering

The connections between existing clustering approaches and the MMRF model and EF-MMRF algorithms are now discussed. By considering them as special cases or variations of the MMRF model, MMRF is shown to provide a unified view to the existing clustering approaches from various important data mining applications.

5.1 Semi-Supervised Clustering

Recently, semi-supervised clustering has become a topic of significant interest [4, 46], which seeks to cluster a set of data points with a set of pairwise constraints.

Semi-supervised clustering can be formulated as a special case of relational clustering, clustering on the single-type relational data set consisting of attributes F and homogeneous relations S. For semi-supervised clustering, S_pqdenotes the pairwise constraint on the pth object and the qth object.

[4] provides a general model for semi-supervised clustering based on Hidden Markov Random Fields (HMRFs). It can be formulated as a special case of MMRC model. As in [4], the homogeneous relation matrix S can be defined as follows,

$S_{pq} = {\begin{matrix} f_{M} (x_{p}, x_{q}) & if (x_{p}, x_{q}) \in M \\ f_{C} (x_{p}, x_{q}) & if (x_{p}, x_{q}) \in C \\ 0 & otherwise \end{matrix}$

where

- denotes a set of must-link constraints;
- denotes a set of cannot-link constraints;
- ƒ_M(x_p, x_q) is a function that penalizes the violation of must-link constraint; and
- ƒ_C(x_p, x_q) is a penalty function for cannot-links.

If a Gibbs distribution [41] is assumed for S,

$\begin{matrix} \Pr (S) = \frac{1}{z_{1}} \exp (- \sum_{p, q} S_{pq}) & (30) \end{matrix}$

where z₁is the normalization constant. Since [4] focuses on only hard clustering, the soft member parameters may be omitted in the MMRC model to consider hard clustering. Based on Eq. (30) and Eq. (4), the likelihood function of hard semi-supervised clustering under MMRC model is

$\begin{matrix} l (Θ) | F) = \frac{1}{z} \exp (- \sum_{p, q} S_{pq}) \exp (- D_{ϕ} (F, Λ C)) & (31) \end{matrix}$

Since C is an indicator matrix, Eq. (31) can be formulated as

$\begin{matrix} l (Θ) | F) = \frac{1}{z} \exp (- \sum_{p, q} S_{pq}) \exp (- \sum_{g = 1}^{k} \sum_{p : C_{gp} = 1} D_{ϕ} (F_{\cdot p}, Λ_{\cdot g})) & (32) \end{matrix}$

The above likelihood function is equivalent to the objective function of semi-supervised clustering based on HMRFs [4]. Furthermore, when applied to optimizing the objective function in Eq. (32), hard MMRC provides a family of semi-supervised clustering algorithms similar to HMRF-K Means in [4]; on the other hand, soft EF-MMRC provides new and soft version semi-supervised clustering algorithms.

5.2 Co-Clustering

Co-clustering or bi-clustering arise in many important applications, such as document clustering, micro-array data clustering. A number of approaches [12, 8, 33, 2] have been proposed for co-clustering. These efforts can be generalized as solving the following matrix approximation problem [34],

$\begin{matrix} \arg \min_{C, Υ} 𝔇 (R, {(C^{(1)})}^{T} Υ C^{(2)}) & (33) \end{matrix}$

where Rϵ custom character ⁿ¹^×n²is the data matrix, C⁽¹⁾ϵ{0,1}^k¹^×n¹and C⁽²⁾ϵ{0,1}^k²⁼ⁿ²are indicator matrices, Υϵ^k¹^×k²is the relation representative matrix, and is a distance function. For example, [12] uses KL-divergences as the distance function; [8, 33] use Euclidean distances.

Co-clustering is equivalent to clustering on relational data of one heterogeneous relation matrix R. Based on Eq. (9), by omitting the soft membership parameters, maximizing log-likelihood function of hard clustering on a heterogeneous relation matrix under the MMRC model is equivalent to the minimization in (33). The algorithms proposed in [12, 8, 33, 2] can be viewed as special cases of hard EF-MMRC. At the same time, soft EF-MMRC provides another family of new algorithms for co-clustering.

[34] proposes the relation summary network model for clustering k-partite graphs, which can be shown to be equivalent on clustering on relational data of multiple heterogeneous relation matrices. The proposed algorithms in [34] can also be viewed as special cases of the hard EF-MMRC algorithm.

5.3 Graph Clustering

Graph clustering (partitioning) is an important problem in many domains, such as circuit partitioning, VLSI design, task scheduling. Existing graph partitioning approaches are mainly based on edge cut objectives, such as Kernighan-Lin objective [30], normalized cut [42], ratio cut [7], ratio association [42], and min-max cut [13].

Graph clustering is equivalent to clustering on single-type relational data of one homogeneous relation matrix S. The log-likelihood function of the hard clustering under MMRC model is −D_ϕ(S,(C)^TΓC). We propose the following theorem to show that the edge cut objectives are mathematically equivalent to a special case of the MMRC model. Since most graph partitioning objective functions use weighted indicator matrix such that CC^T=I_k, where I_kis an identity matrix, we follow this formulation in the following theorem.

Theorem 2.

With restricting Γ to be the form of rI_kfor r>0, maximizing the log-likelihood of hard MMRC clustering on S under normal distribution, i.e.,

$\begin{matrix} \overset{ma x}{C \in {0, 1}^{k \times n}, {CC}^{T} =} I_{k} - { S - {(C)}^{T} (rIk) C }^{2}, & (34) \end{matrix}$

is equivalent to the trace maximization

max tr(CSC^T), (35)

where tr denotes the trace of a matrix.

Proof.

Let L denote the objective function in Eq. (34).

$\begin{matrix} L = - { S - {rC}^{T} C }^{2} \\ = - tr ((S - {rC}^{T} C) (S - {rC}^{T} C)) \\ = - tr (S^{T} S) + 2 r tr (C^{T} CS) - r^{2} tr (C^{T} {CC}^{T} C) \\ = - tr (S^{T} S) + 2 r tr ({CSC}^{T}) - r^{2} k \end{matrix}$

The above deduction uses the property of trace tr(XY)=tr(YX). Since tr(S^TS), r and k are constants, the maximization of L is equivalent to the maximization of tr(CSC^T).

The proof is completed.

Since it is shown in the literature [10] that the edge cut objectives can be formulated as the trace maximization, Theorem 2 states that edge-cut based graph clustering is equivalent to MMRC model under normal distribution with the diagonal constraint on the parameter matrix F. This connection provides not only a new understanding for graph partitioning but also a family of new algorithms (soft and hard MMRC algorithms) for graph clustering.

Finally, we point out that MMRC model does not exclude traditional attribute-based clustering. When applied to an attribute data matrix under Euclidean distances, hard MMRC algorithm is actually reduced to the classic k-means; soft MMRC algorithm is very close to the traditional mixture model EM clustering except that it does not involve mixing proportions in the computation.

In summary, MMRC model provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering; soft and hard EF-MMRC algorithms unify a number of state-of-the-art clustering algorithms and at the same time provide new solutions to various clustering tasks.

6. Experiments

This section provides empirical evidence to show the effectiveness of the MMRC model and algorithms. Since a number of state-of-the-art clustering algorithms [12, 8, 33, 2, 3, 4] can be viewed as special cases of EF-MMRC model and algorithms, the experimental results in these efforts also illustrate the effectiveness of the MMRC model and algorithms. MMRC algorithms are applied to tasks of graph clustering, bi-clustering, tri-clustering, and clustering on a general relational data set of all three types of information. In the experiments, mixed version MMRC was employed, i.e., hard MMRC initialization followed by soft MMRC. Although MMRC can adopt various distribution assumptions, due to space limit, MMRC is used under normal or Poisson distribution assumption in the experiments. However, this does not imply that they are optimal distribution assumptions for the data. Therefore, one can select or derive an optimal distribution assumption as may be appropriate.

For performance measure, the Normalized Mutual Information (NMI) [44] between the resulting cluster labels and the true cluster labels was used, which is a standard way to measure the cluster quality. The final performance score is the average of ten runs.

TABLE 1

Summary of relational data for Graph Clustering.

Name
n
k
Balance
Source

tr11
414
9
0.046
TREC

tr23
204
6
0.066
TREC

NG1-20
14000
20
1.0
20-newsgroups

k1b
2340
6
0.043
WebACE

6.1 Graph Clustering

Experiments on the MMRC algorithm are presented under normal distribution in comparison with two representative graph partitioning algorithms, the spectral graph partitioning (SGP) from [36] that is generalized to work with both normalized cut and ratio association, and the classic multilevel algorithm, METIS [28].

The graphs based on the text data have been widely used to test graph partitioning algorithms [13, 11, 25]. In this study, we use various data sets from the 20-newsgroups [32], WebACE and TREC [27], which cover data sets of different sizes, different balances and different levels of difficulties. The data are pre-processed by removing the stop words and each document is represented by a term-frequency vector using TF-IDF weights. Relational data are then constructed for each text data set such that objects (documents) are related to each other with cosine similarities between the term-frequency vectors. A summary of all the data sets to construct relational data used in this paper is shown in Table 1, in which n denotes the number of objects in the relational data, k denotes the number of true clusters, and balance denotes the size ratio of the smallest clusters to the largest clusters.

For the number of clusters k, the number of the true clusters is used. Determining the optimal number of clusters analytically is a model selection problem, otherwise this may be determined empirically or iteratively.

FIG. 2 shows the NMI comparison of the three algorithms. Although there is no single winner on all the graphs, it may be observed overall that the MMRC algorithm performs better than SGP and METIS. Especially on the difficult data set tr23, MMRC increases performance about 30%. Hence, MMRC under normal distribution provides a new graph partitioning algorithm which is viable and competitive compared with the two existing state-of-the-art graph partitioning algorithms. Note that although the normal distribution is most popular, MMRC under other distribution assumptions may be more desirable in specific graph clustering applications depends on the statistical properties of the graphs.

TABLE 2

Subsets of Newsgroup Data for bi-type relational data

Dataset

# Documents
Total #

Name
Newsgroups Included
per Group
Documents

BT-NG1
rec.sport.baseball, rec.sport.hockey
200
400

BT-NG2
comp.os.ms-windows.misc,
200
1000

comp.windows.x, rec.motorcycles,

sci.crypt, sci.space

BT-NG3
comp.os.ms-windows.misc,
200
1600

comp.windows.x, misc.forsale,

rec.motorcycles, rec.motorcycles,

sci.crypt, sci.space,

talk.politics.mideast,

talk.religion.misc

TABLE 3

Taxonomy structures of two data sets for constructing tri-partite

relational data

Data set
Taxonomy structure

TT-TM1
{rec.sport.baseball, rec.sport.hockey},

{talk.politics.guns, talk.politics.mideast,

talk.politics.misc}

TT-TM2
{comp.graphics, comp.os.ms-windows.misc},

{rec.autos, rec.motorcycles},

{sci.crypt, sci.electronics}

6.2 Biclustering and Triclustering

The MMRC algorithm are now applied under Poisson distribution to clustering bi-type relational data, word-document data, and tri-type relational data, word-document-category data. Two algorithms, Bi-partite Spectral Graph partitioning (BSGP) [11] and Relation Summary Network under Generalized I-divergence (RSN-GI) [34], are used as comparison in bi-clustering. For tri-clustering, Consistent Bipartite Graph Co-partitioning (CB GC) [18] and RSN-GI are used as comparison.

The bi-type relational data, word-document data, are constructed based on various subsets of the 20-Newsgroup data. The data is pre-processed by selecting the top 2000 words by the mutual information. The document-word matrix is based on tf.idf weighting scheme and each document vector is normalized to a unit L₂norm vector. Specific details of the data sets are listed in Table 2. For example, for the data set BT-NG3 200 documents are randomly and evenly sampled from the corresponding newsgroups; then a bi-type relational data set of 1600 document and 2000 word is formulated.

The tri-type relational data are built based on the 20-newsgroups data for hierarchical taxonomy mining. In the field of text categorization, hierarchical taxonomy classification is widely used to obtain a better trade-off between effectiveness and efficiency than flat taxonomy classification. To take advantage of hierarchical classification, one must mine a hierarchical taxonomy from the data set. We see that words, documents, and categories formulate a sandwich structure tri-type relational data set, in which documents are central type nodes. The links between documents and categories are constructed such that if a document belongs to k categories, the weights of links between this document and these k category nodes are 1=k (please refer [18] for details). The true taxonomy structures for two data sets, TP-TM1 and TP-TM2, are documented in Table 3.

TABLE 4

Two Clusters from actor-movie data

cluster 23 of actors

Viggo Mortensen, Sean Bean, Miranda Otto,

Ian Holm, Christopher Lee, Cate Blanchett,

Ian McKellen, Liv Tyler, David Wenham,

Brad Dourif, John Rhys-Davies, Elijah Wood,

Bernard Hill, Sean Astin, Andy Serkis,

Dominic Monaghan, Karl Urban, Orlando Bloom,

Billy Boyd, John Noble, Sala Baker

cluster 118 of movies

The Lord of the Rings: The Fellowship of the Ring (2001)

The Lord of the Rings: The Two Towers (2002)

The Lord of the Rings: The Return of the King (2003)

FIG. 3 and FIG. 4 show the NMI comparison of the three algorithms on bi-type and tri-type relational data, respectively. It may be observed that the MMRC algorithm performs significantly better than BSGP and CBGC. MMRC performs slightly better than RSN on some data sets. Since RSN is a special case of hard MMRC, this shows that mixed MMRC improves hard MMRC's performance on the data sets. Therefore, compared with the existing state-of-the-art algorithms, the MMRC algorithm performs more effectively on these bi-clustering or tri-clustering tasks and on the other hand, it is flexible for different types of multi-clustering tasks which may be more complicated than tri-type clustering.

6.3 A Case Study on Actor-Movie Data

The MMRC algorithm was also run on the actor-movie relational data based on IMDB movie data set for a case study. In the data, actors are related to each other by collaboration (homogeneous relations); actors are related to movies by taking roles in movies (heterogeneous relations); movies have attributes such as release time and rating (note that there is no links between movies). Hence the data have all the three types of information. A data set of 20000 actors and 4000 movies is formulated. Experiments were run with k=200. Although there is no ground truth for the data's cluster structure, it may be observed that most resulting clusters that are actors or movies of the similar style such as action, or tight groups from specific movie serials. For example, Table 4 shows cluster 23 of actors and cluster 118 of movies; the parameter Υ_23;118shows that these two clusters are strongly related to each other. In fact, the actor cluster contains the actors in the movie series “The Lord of the Rings”. Note that if we only have one type of actor objects, we only get the actor clusters, but with two types of nodes, although there are no links between the movies, we also get the related movie clusters to explain how the actors are related.

7. Conclusions

A probabilistic model is formulated for relational clustering, which provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. Under this model, parametric hard and soft relational clustering algorithms are presented under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unify a number of state-of-the-art clustering algorithms. The theoretic analysis and experimental evaluation show the effectiveness and great potential of the model and algorithms.

The invention is applicable to various relational data from various applications. It is capable of adapting different distribution assumptions for different relational data with different statistical properties. While the above analysis discuss in depth certain types of statistical distributions, the system and method may be used with any statistical distribution. The resulting parameter matrices provides an intuitive summary for the hidden structure for relational data. Therefore, in addition to finding application in clustering data objects, the present system and method may be used for more general analysis of relationships of data, for other end purposes and/or as an intermediary step in a larger or more complex data analysis paradigm.

The present invention has significant versatility, and can be applied to a wide range of applications involving relational data. Examples include, but are not limited to:

(1) Clustering web documents using both text and link information;

(2) Rating prediction in a recommendation system;

(3) Community detection in social network analysis; and

(4) Discovering gene patterns in bioinformatics application.

The present method may be implemented on a general purpose computer or a specially adapted machine. Typically, a programmable processor will execute machine-readable instructions stored on a computer-readable medium. In other cases, the method will be implemented using application specific hardware, and may not be reprogrammable.

An exemplary programmable computing device for implementing an embodiment of the invention includes at least a processing unit and a memory. Depending on the exact configuration and type of computing device, the memory may be volatile (such as RAM), nonvolatile (such as ROM, flash memory, etc.) or some combination of the two. Additionally, the device may also have additional features/functionality. For example, the device may also include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or tapes. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. The memory, the removable storage and the non-removable storage are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory, FRAM, or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the device. The device may also contain one or more communications connections that allow the device to communicate with other devices. Such communication connections may include, for example, Ethernet, wireless communications, optical communications, serial busses, parallel busses, and the like. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. As discussed above, the term computer readable media as used herein includes both storage media and communication media.

One use for the present method is to process information databases, which may be private or public. For example, the information database may comprise information received from the Internet, such as the content of various web pages from world wide web sites, or other information found on the Internet. In other cases, the data may be more structured, for example the content of the Facebook social networking site/system. Further, the information may be private user information, such as the contents of a user's hard drive, especially, for example, the user generated or downloaded content.

Having described specific embodiments of the present invention, it will be understood that many modifications thereof will readily appear or may be suggested to those skilled in the art, and it is intended therefore that this invention is limited only by the spirit and scope of the following claims.

8. References

The following are expressly incorporated herein by reference:

Bo Long Mark (Zhongfei) Zhang, Philip S. Yu, “Graph Partitioning Based on Link Distributions”, AAAI (2007).
Bo Long Mark (Zhongfei) Zhang, Philip S. Yu, “A Probabilistic Framework for Relational Clustering” KDD (2007).
Bo Long Mark (Zhongfei) Zhang, Xiaoyun Wu, Philip S. Yu, “Relational Clustering by Symmetric Convex Coding”. Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oreg. (2007).
[1] E. Airoldi, D. Blei, E. Xing, and S. Fienberg. Mixed membership stochastic block models for relational data with application to protein-protein interactions. In ENAR-2006.
[2] A. Banerjee, I. S. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In KDD, pages 509-514, 2004.
[3] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with bregman divergences. J. Mach. Learn. Res., 6:1705-1749, 2005.
[4] S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In KDD04, pages 59-68, 2004.
[5] I. Bhattachrya and L. Getor. Entity resolution in graph data. Technical Report CS-TR-4758, University of Maryland, 2005.
[6] T. N. Bui and C. Jones. A heuristic for reducing fill-in in sparse matrix factorization. In PPSC, pages 445-452, 1993.
[7] P. K. Chan, M. D. F. Schlag, and J. Y. Zien. Spectral k-way ratio-cut partitioning and clustering. In DAC '93.
[8] H. Cho, I. Dhillon, Y. Guan, and S. Sra. Minimum sum squared residue co-clustering of gene expression data. In SDM, 2004.
[9] M. Collins, S. Dasgupta, and R. Reina. A generalization of principal component analysis to the exponential family. In NIPS '01, 2001.
[10] I. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report TR-04-25, University of Texas at Austin, 2004.
[11] I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD '01.
[12] I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD '03, pages 89-98.
[13] C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of ICDM 2001, pages 107-114, 2001.
[14] S. Dzeroski and N. Lavrac, editors. Relational Data Mining. Springer, 2001.
[15] E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. In NAS.
[16] E. Erosheva and S. E. Fienberg. Bayesian mixed membership models for soft clustering and classification. Classification-The Ubiquitous Challenge, pages 11-26, 2005.
[17] S. E. Fienberg, M. M. Meyer, and S. Wasserman. Satistical analysis of multiple cociometric relations. Journal of American Satistical Association, 80:51-87, 1985.
[18] B. Gao, T.-Y. Liu, X. Zheng, Q.-S. Cheng, and W.-Y. Ma. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In KDD '05, pages 41-50, 2005.
[19] L. Getoor. An introduction to probabilistic graphical models for relational data. Data Engineering Bulletin, 29, 2006.
[20] B. Hendrickson and R. Leland. A multilevel algorithm for partitioning graphs. In Supercomputing '95, Page 28, 1995.
[21] P. Hoff, A. Rafery, and M. Handcock. Latent space approaches to social network analysis. Journal of American Satistical Association, 97:1090-1098, 2002.
[22] T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, UAI '99, Stockholm, 1999.
[23] T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In IJCAI '99, Stockholm, 1999.
[24] L. B. Holder and D. J. Cook. Graph-based relational learning: current and future directions. SIGKDD Explor. Newsl., 5(1):90-93, 2003.
[25] M. X. H. Zha, C. Ding and H. Simon. Bi-partite graph partitioning and data clustering. In ACM CIKM '01, 2001.
[26] G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In KDD-2002, 2002.
[27] G. Karypis. A clustering toolkit, 2002.
[28] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359-392, 1998.
[29] M. Kearns, Y. Mansour, and A. Ng. An information-theoretic analysis of hard and soft assignment methods for clustering. In UAI '97, pages 282-293, 2004.
[30] B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291-307, 1970.
[31] M. Kirsten and S. Wrobel. Relational distance-based clustering. In Proc. Fachgruppentreffen Maschinelles Lernen (FGML-98), pages 119-124, 1998.
[32] K. Lang. News weeder: Learning to filter netnews. In ICML, 1995.
[33] T. Li. A general model for clustering binary data. In KDD '05, 2005.
[34] B. Long, X. Wu, Z. M. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD-2006, 2006.
[35] B. Long, Z. Zhang, and P. Yu. Co-clustering by block value decomposition. In KDD '05, 2005.
[36] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, 2001.
[37] L. D. Raedt and H. Blockeel. Using logical decision trees for clustering. In Proceedings of the 7th International Workshop on Inductive Logic Programming, 1997.
[38] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley & Sons, New York, 2000.
[39] N. Rosenberg, J. Pritchard, J. Weber, and H. Cann. Genetic structure of human population. Science, 298, 2002.
[40] J. S. D. Pietra, V. D. Pietera. Duality and auxiliary functions for bregman distances. Technical Report CMU-CS-01-109, Carnegie Mellon University, 2001.
[41] S. Geman and D. Geman. Stochastic relaxation, gibbs distribution, and the bayesian restoration of images. Pattern Analysis and Machine Intelligence, 6:721-742, 1984.
[42] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888-905, 2000.
[43] T. Snijders. Markov chain monte carlo estimation of exponential random graph models. Journal of Social Structure, 2002.
[44] A. Strehl and J. Ghosh. Cluster ensembles {a knowledge reuse framework for combining partitionings. In AAAI 2002, pages 93-98, 2002.
[45] B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In Proceeding of IJCAI-01, 2001.
[46] K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In ICML-2001, pages 577-584, 2001.
[47] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, and W.-Y. Ma. Recom: reinforcement clustering of multi-type interrelated data objects. In SIGIR '03, pages 274-281, 2003.
[48] E. Xing, A. Ng, M. Jorda, and S. Russel. Distance metric learning with applications to clustering with side information. In NIPS '03, volume 16, 2003.
[49] X. Yin, J. Han, and P. Yu. Cross-relational clustering with user's guidance. In KDD-2005, 2005.
[50] X. Yin, J. Han, and P. Yu. Linkclus: Efficient clustering via heterogeneous semantic links. In VLDB-2006, 2006.
[51] H.-J. Zeng, Z. Chen, and W.-Y. Ma. A unified framework for clustering heterogeneous web objects. In WISE '02, pages 161-172, 2002.

Claims

1. A method of clustering a set of objects having respective object types, respective object attributes, homogeneous relationships between respective objects of the same object type, and heterogeneous relationships between objects having a different object types, the method comprising: iteratively optimizing a clustering of the set of objects within a plurality of latent classes, dependent on object type, object attributes, homogeneous relationships, and heterogeneous relationships, by performing:in an expectation step, updating a set of posteriors to maximize a probability that an object is associated with a respective latent class comprising, for each object, individually fixing an assigned latent class for all other objects, and maximizing an objective function for the respective object, comprising minimizing a computed distance between an observation of the object attributes, homogeneous relationships, and heterogeneous relationships of a respective object and parameters of a corresponding expectation that the object is associated with the respective latent class, and repeating until no object changes in assigned latent class between successive repetition, andin a minimization step, updating the plurality of latent classes based on the updated set of posteriors; andstoring the optimized clustering.
2. The method according to claim 1, wherein said computed distance comprises a Bregman distance.
3. The method according to claim 1, wherein the homogeneous and heterogeneous relationships between the members of the set of objects do not comply with an independent and identically distributed (IID) statistical presumption.
4. The method according to claim 1, wherein the posteriors are computed using a Gibbs sampler.
5. The method according to claim 1, wherein a clustering of at least one object is constrained by a clustering of at least one other object.
6. The method according to claim 1, wherein at least one object has at least one of the respective object attributes, homogeneous relationships, and heterogeneous relationships which are labelled, and at least one object has respective object attributes, homogeneous relationships, and heterogeneous relationships which are unlabeled, to provide a semi-supervised clustering.
7. An system for clustering a set of objects having object types, object attributes, homogeneous relationships between objects of the same object type, and heterogeneous relationships between objects having different object types, the system comprising: a programmable processor configured to: iteratively optimize a clustering of the set of objects within a plurality of latent classes, dependent on object types, object attributes, homogeneous relationships, and heterogeneous relationships, by performing, in an expectation step, wherein the programmable processor is configured to: update a set of posteriors to maximize a probability that an object is associated with a respective latent class, comprising, for each object, a substep to individually fix an assigned cluster for all other objects, andmaximize an objective function for the respective object, comprising a substep to minimize a computational distance between an observation of the object attributes, homogeneous relationships, and heterogeneous relationships of a respective object and parameters of a corresponding expectation that the object is associated with the respective latent class, andrepeat until no object changes in assigned cluster between successive repetition, andin a minimization step, updating the plurality of latent classes based on the updated set of posteriors; andstore the optimized clustering in a memory; anda communications port configured to communicate at least one of an object and clustering-related information.
8. The system according to claim 7, wherein the computational distance comprises a Bregman distance.
9. The system according to claim 7, wherein the homogeneous relationships and heterogeneous relationships between the members of the set of objects do not comply with an independent and identically distributed (IID) statistical presumption.
10. The system according to claim 7, wherein programmable processor is further configured to compute the posteriors a Gibbs sampler.
11. The system according to claim 7, wherein a clustering of at least one object is constrained by a clustering of at least one other object.
12. The system according to claim 7, wherein at least one object has at least one of respective object attributes, homogeneous relationships, and heterogeneous relationships which are labelled, and at least one object has at least one of respective object attributes, homogeneous relationships, and heterogeneous relationships which are unlabeled, to provide a semi-supervised clustering.
13. A method of clustering a plurality of objects, having object types, object attributes, homogeneous relationships between objects of the same object type, and heterogeneous relationships between objects having different object types which do not comply with an independent and identically distributed (IID) statistical presumption, the method comprising: optimizing an object clustering of the plurality of objects in a plurality of latent object classes based on at least the object types, object attributes, homogeneous relationships, and heterogeneous relationships, by iteratively: updating a set of posteriors to maximize an expectation probability that an object is associated with a respective latent object class comprising, iteratively maximizing an objective function for each respective object while an assigned latent object class for all other objects is fixed, until no object changes in assigned latent object class occurs, and maximizing an objective function for the respective object, comprising minimizing a computed distance between an observation of the object attributes, homogeneous relationships, and heterogeneous relationships of a respective object and parameters of the corresponding expectation probability, and repeating until no object changes in assigned latent object class occurs, andupdating the plurality of latent object classes based on the updated set of posteriors; andat least one of: communicating a latent object class associated with an object, communicating a set of objects within a latent object class, and responding to a query based on the optimized object clustering.
14. The method according to claim 13, wherein the posteriors are computed using a Gibbs sampler.
15. The method according to claim 13, wherein an association of an object with a latent object class is constrained by an association of at least one other object with a latent object class.
16. The method according to claim 13, wherein the object clustering is semi-supervised, and a portion of the set of object attributes, homogeneous relationships, and respective heterogeneous relationships are labelled.
17. The method according to claim 13, wherein the object clustering partitions an arbitrarily complex graph involving at least the data object attributes, the homogeneous relations and the heterogeneous relations.
18. The method according to claim 13, wherein the computed distance comprises a Bregman distance.
19. The method according to claim 13, wherein the plurality of objects comprise a set of hyperlinked objects, wherein the respective object attributes comprise an object information content and the relations between respective data objects comprise hyperlink information.
20. The method according to claim 13, wherein the object attributes, homogeneous relationships between objects of the same object type, and heterogeneous relationships between objects having different object types each have a statistical distribution independently selected from a normal distribution, a Bernoulli distribution, a multinomial distribution and an exponential distribution.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application: is a Continuation of U.S. patent application Ser. No. 14/672,430, filed Mar. 30, 2015, now U.S. Pat. No. 9,372,915, issued Jun. 21, 2016, which is a Continuation of U.S. patent application Ser. No. 14/217,939, filed Mar. 18, 2014, now U.S. Pat. No. 8,996,528, issued Mar. 31, 2015, which is a Continuation of U.S. patent application Ser. No. 13/628,559, filed Sep. 27, 2012, now U.S. Pat. No. 8,676,805, issued Mar. 18, 2014, which is a Continuation of U.S. patent application Ser. No. 12/538,835, filed Aug. 10, 2009, now U.S. Pat. No. 8,285,719, issued Oct. 9, 2012, which claims benefit of priority from U.S. Provisional Patent Application Ser. No. 61/087,168, filed Aug. 8, 2008, the entirety of which are expressly incorporated herein by reference.

GOVERNMENT RIGHTS CLAUSE

This invention was made with government support under IIS-0535162 awarded by The National Science Foundation and award FA8750-05-2-0284 awarded by AFRL and award FA9550-06-1-0327 awarded by AFOSR. The government has certain rights in the invention.

US Referenced Citations (335)

Number	Name	Date	Kind
5263120	Bickel	Nov 1993	A
5473732	Chang	Dec 1995	A
5933818	Kasravi et al.	Aug 1999	A
6108004	Medl	Aug 2000	A
6298351	Castelli et al.	Oct 2001	B1
6317438	Trebes, Jr.	Nov 2001	B1
6363411	Dugan et al.	Mar 2002	B1
6389436	Chakrabarti et al.	May 2002	B1
6535518	Hu et al.	Mar 2003	B1
6594355	Deo et al.	Jul 2003	B1
6683455	Ebbels et al.	Jan 2004	B2
6708163	Kargupta et al.	Mar 2004	B1
6718338	Vishnubhotla	Apr 2004	B2
6718486	Roselli et al.	Apr 2004	B1
6779030	Dugan et al.	Aug 2004	B1
6788688	Trebes, Jr.	Sep 2004	B2
6850252	Hoffberg	Feb 2005	B1
6954525	Deo et al.	Oct 2005	B2
6970882	Yao et al.	Nov 2005	B2
6985951	Kubala et al.	Jan 2006	B2
7023979	Wu et al.	Apr 2006	B1
7026121	Wohlgemuth et al.	Apr 2006	B1
7035739	Schadt et al.	Apr 2006	B2
7089558	Baskey et al.	Aug 2006	B2
7194134	Bradshaw	Mar 2007	B2
7209964	Dugan et al.	Apr 2007	B2
7212160	Bertoni et al.	May 2007	B2
7227942	Deo et al.	Jun 2007	B2
7235358	Wohlgemuth et al.	Jun 2007	B2
7243112	Qu et al.	Jul 2007	B2
7289985	Zeng et al.	Oct 2007	B2
7430475	Imoto et al.	Sep 2008	B2
7436981	Pace	Oct 2008	B2
7457472	Pace et al.	Nov 2008	B2
7461073	Gao et al.	Dec 2008	B2
7519589	Charnock et al.	Apr 2009	B2
7579148	Wohlgemuth et al.	Aug 2009	B2
7590589	Hoffberg	Sep 2009	B2
7617163	Ben-Hur et al.	Nov 2009	B2
7617176	Zeng et al.	Nov 2009	B2
7640114	Showe et al.	Dec 2009	B2
7644373	Jing et al.	Jan 2010	B2
7645575	Wohlgemuth et al.	Jan 2010	B2
7653491	Schadt et al.	Jan 2010	B2
7657504	Jing et al.	Feb 2010	B2
7676034	Wu et al.	Mar 2010	B1
7676442	Ben-Hur et al.	Mar 2010	B2
7689610	Bansal et al.	Mar 2010	B2
7707208	Jing et al.	Apr 2010	B2
7729864	Schadt	Jun 2010	B2
7743058	Liu et al.	Jun 2010	B2
7747593	Patterson et al.	Jun 2010	B2
7756896	Feingold	Jul 2010	B1
7799519	Caprioli	Sep 2010	B2
7813822	Hoffberg	Oct 2010	B1
7836050	Jing et al.	Nov 2010	B2
7858323	Chinnaiyan et al.	Dec 2010	B2
7877343	Cafarella et al.	Jan 2011	B2
7912714	Kummamuru et al.	Mar 2011	B2
7917911	Bansal et al.	Mar 2011	B2
7961957	Schclar et al.	Jun 2011	B2
7974714	Hoffberg	Jul 2011	B2
8078619	Bansal et al.	Dec 2011	B2
8080371	Ballinger et al.	Dec 2011	B2
8117203	Gazen et al.	Feb 2012	B2
8131567	Dalton	Mar 2012	B2
8135711	Charnock et al.	Mar 2012	B2
8145677	Al-Shameri	Mar 2012	B2
8185481	Long et al.	May 2012	B2
8214424	Arimilli et al.	Jul 2012	B2
8219417	Dalton	Jul 2012	B2
8285719	Long et al.	Oct 2012	B1
8296398	Lacapra et al.	Oct 2012	B2
8352384	Mansinghka et al.	Jan 2013	B2
8370328	Woytowitz et al.	Feb 2013	B2
8370338	Gordo et al.	Feb 2013	B2
8379967	Bush et al.	Feb 2013	B1
8429027	Zheng	Apr 2013	B2
8458074	Showalter	Jun 2013	B2
8504490	Nie et al.	Aug 2013	B2
8521769	Whelan	Aug 2013	B2
8555243	Correll	Oct 2013	B2
8595338	Ravichandran et al.	Nov 2013	B2
8600830	Hoffberg	Dec 2013	B2
8630975	Guo et al.	Jan 2014	B1
8630989	Blohm et al.	Jan 2014	B2
8676805	Long et al.	Mar 2014	B1
8700547	Long et al.	Apr 2014	B2
8713021	Bellegarda	Apr 2014	B2
8719774	Wang et al.	May 2014	B2
8726228	Ravindran et al.	May 2014	B2
8762463	Ravichandran et al.	Jun 2014	B2
8762484	Ravichandran et al.	Jun 2014	B2
8775300	Showalter	Jul 2014	B2
8819121	Ravichandran et al.	Aug 2014	B2
8819122	Ravichandran et al.	Aug 2014	B2
8819207	Ravichandran et al.	Aug 2014	B2
8825746	Ravichandran et al.	Sep 2014	B2
8825830	Newton et al.	Sep 2014	B2
8828668	Axtell et al.	Sep 2014	B2
8838490	Quadracci et al.	Sep 2014	B2
8843356	Schadt et al.	Sep 2014	B2
8843490	Gazen et al.	Sep 2014	B2
8843571	Ravichandran et al.	Sep 2014	B2
8849058	Kennedy et al.	Sep 2014	B2
8849790	Bellare et al.	Sep 2014	B2
8856233	Lacapra et al.	Oct 2014	B2
8874477	Hoffberg	Oct 2014	B2
8886649	Zhang et al.	Nov 2014	B2
8887121	Ravindran et al.	Nov 2014	B2
8903748	Gemulla et al.	Dec 2014	B2
8909514	Toutanova et al.	Dec 2014	B2
8918178	Simon et al.	Dec 2014	B2
8918348	Nie et al.	Dec 2014	B2
8930304	Guo et al.	Jan 2015	B2
8935249	Traub et al.	Jan 2015	B2
8935314	Ravichandran et al.	Jan 2015	B2
8938410	Cafarella et al.	Jan 2015	B2
8983628	Simon et al.	Mar 2015	B2
8983629	Simon et al.	Mar 2015	B2
8983879	Gemulla et al.	Mar 2015	B2
8996350	Dub et al.	Mar 2015	B1
8996528	Long et al.	Mar 2015	B1
9009147	He et al.	Apr 2015	B2
9092517	Paparizos et al.	Jul 2015	B2
9116974	Heit et al.	Aug 2015	B2
9122698	Lacapra et al.	Sep 2015	B2
9128101	Halbert et al.	Sep 2015	B2
9165051	Masud et al.	Oct 2015	B2
9197517	Ravichandran et al.	Nov 2015	B2
9213719	Lacapra et al.	Dec 2015	B2
9213720	Lacapra et al.	Dec 2015	B2
9229924	Sun et al.	Jan 2016	B2
9269051	Guo et al.	Feb 2016	B2
9275135	De et al.	Mar 2016	B2
9305015	Lacapra et al.	Apr 2016	B2
9311670	Hoffberg	Apr 2016	B2
9317569	Nie et al.	Apr 2016	B2
9335977	Wang et al.	May 2016	B2
9353415	Nikolsky et al.	May 2016	B2
9361356	Heit et al.	Jun 2016	B2
9361360	Fang	Jun 2016	B2
9372915	Long et al.	Jun 2016	B2
9727532	Perronnin	Aug 2017	B2
20010034023	Stanton, Jr. et al.	Oct 2001	A1
20010047271	Culbert et al.	Nov 2001	A1
20020103793	Koller et al.	Aug 2002	A1
20020122596	Bradshaw	Sep 2002	A1
20020129082	Baskey et al.	Sep 2002	A1
20020129085	Kubala et al.	Sep 2002	A1
20020129172	Baskey et al.	Sep 2002	A1
20020129274	Baskey et al.	Sep 2002	A1
20030018620	Vishnubhotla	Jan 2003	A1
20030120457	Singh	Jun 2003	A1
20030195889	Yao et al.	Oct 2003	A1
20030219764	Imoto et al.	Nov 2003	A1
20040113953	Newman	Jun 2004	A1
20040162852	Qu et al.	Aug 2004	A1
20050079508	Dering et al.	Apr 2005	A1
20050108200	Meik et al.	May 2005	A1
20050154701	Parunak et al.	Jul 2005	A1
20050170528	West et al.	Aug 2005	A1
20060111849	Schadt et al.	May 2006	A1
20060122816	Schadt et al.	Jun 2006	A1
20060184464	Tseng	Aug 2006	A1
20060241869	Schadt et al.	Oct 2006	A1
20060253262	Ching et al.	Nov 2006	A1
20060263813	Rosenberg	Nov 2006	A1
20060271309	Showe et al.	Nov 2006	A1
20070038386	Schadt et al.	Feb 2007	A1
20070073748	Barney	Mar 2007	A1
20070099239	Tabibiazar et al.	May 2007	A1
20070118498	Song	May 2007	A1
20070130206	Zhou et al.	Jun 2007	A1
20070156736	Bestgen	Jul 2007	A1
20070161009	Kohne	Jul 2007	A1
20070166707	Schadt et al.	Jul 2007	A1
20070172844	Lancaster et al.	Jul 2007	A1
20070174267	Patterson et al.	Jul 2007	A1
20080033897	Lloyd	Feb 2008	A1
20080114800	Gazen et al.	May 2008	A1
20080147654	Cao	Jun 2008	A1
20080154848	Haslam	Jun 2008	A1
20080243479	Cafarella et al.	Oct 2008	A1
20080249999	Renders	Oct 2008	A1
20080294686	Long et al.	Nov 2008	A1
20090006002	Honisch et al.	Jan 2009	A1
20090043797	Dorie	Feb 2009	A1
20090112571	Kummamuru et al.	Apr 2009	A1
20090112588	Kummamuru et al.	Apr 2009	A1
20090228238	Mansinghka et al.	Sep 2009	A1
20090271412	Lacapra et al.	Oct 2009	A1
20090287685	Charnock et al.	Nov 2009	A1
20090307049	Elliott, Jr.	Dec 2009	A1
20090319244	West et al.	Dec 2009	A1
20100015605	Zucman-Rossi et al.	Jan 2010	A1
20100161652	Bellare et al.	Jun 2010	A1
20100179765	Ching et al.	Jul 2010	A1
20100216660	Nikolsky et al.	Aug 2010	A1
20100223276	Al-Shameri et al.	Sep 2010	A1
20100269027	Arimilli et al.	Oct 2010	A1
20100284915	Dai et al.	Nov 2010	A1
20100305058	Lancaster et al.	Dec 2010	A1
20110059861	Nolan et al.	Mar 2011	A1
20110191276	Cafarella et al.	Aug 2011	A1
20110258049	Ramer et al.	Oct 2011	A1
20110282877	Gazen et al.	Nov 2011	A1
20120030646	Ravindran et al.	Feb 2012	A1
20120030647	Wang et al.	Feb 2012	A1
20120030648	Correll	Feb 2012	A1
20120030650	Ravindran et al.	Feb 2012	A1
20120054184	Masud et al.	Mar 2012	A1
20120054226	Cao et al.	Mar 2012	A1
20120143853	Gordo et al.	Jun 2012	A1
20120184449	Hixson et al.	Jul 2012	A1
20120209705	Ramer et al.	Aug 2012	A1
20120209706	Ramer et al.	Aug 2012	A1
20120209707	Ramer et al.	Aug 2012	A1
20120209708	Ramer et al.	Aug 2012	A1
20120209709	Ramer et al.	Aug 2012	A1
20120209710	Ramer et al.	Aug 2012	A1
20120215602	Ramer et al.	Aug 2012	A1
20120215612	Ramer et al.	Aug 2012	A1
20120215622	Ramer et al.	Aug 2012	A1
20120215623	Ramer et al.	Aug 2012	A1
20120215624	Ramer et al.	Aug 2012	A1
20120215625	Ramer et al.	Aug 2012	A1
20120215626	Ramer et al.	Aug 2012	A1
20120215635	Ramer et al.	Aug 2012	A1
20120215639	Ramer et al.	Aug 2012	A1
20120215640	Ramer et al.	Aug 2012	A1
20120290988	Sun et al.	Nov 2012	A1
20120296907	Long et al.	Nov 2012	A1
20130013619	Lacapra et al.	Jan 2013	A1
20130013639	Lacapra et al.	Jan 2013	A1
20130013654	Lacapra et al.	Jan 2013	A1
20130013655	Lacapra et al.	Jan 2013	A1
20130013675	Lacapra et al.	Jan 2013	A1
20130018928	Lacapra et al.	Jan 2013	A1
20130018930	Lacapra et al.	Jan 2013	A1
20130039548	Nielsen et al.	Feb 2013	A1
20130041896	Ghani et al.	Feb 2013	A1
20130066830	Lacapra et al.	Mar 2013	A1
20130066931	Lacapra et al.	Mar 2013	A1
20130116150	Wilcox et al.	May 2013	A1
20130218474	Longo	Aug 2013	A1
20130288244	Deciu et al.	Oct 2013	A1
20130323744	Hahn et al.	Dec 2013	A1
20130337456	Honisch et al.	Dec 2013	A1
20130338933	Deciu et al.	Dec 2013	A1
20140031308	Diane et al.	Jan 2014	A1
20140040855	Wang et al.	Feb 2014	A1
20140122039	Xu et al.	May 2014	A1
20140127716	Longo et al.	May 2014	A1
20140134650	Hawtin et al.	May 2014	A1
20140162887	Martin et al.	Jun 2014	A1
20140172944	Newton et al.	Jun 2014	A1
20140172951	Varney et al.	Jun 2014	A1
20140172952	Varney et al.	Jun 2014	A1
20140172956	Varney et al.	Jun 2014	A1
20140172970	Newton et al.	Jun 2014	A1
20140173023	Varney et al.	Jun 2014	A1
20140173029	Varney et al.	Jun 2014	A1
20140173030	Varney et al.	Jun 2014	A1
20140173038	Newton et al.	Jun 2014	A1
20140173039	Newton et al.	Jun 2014	A1
20140173040	Newton et al.	Jun 2014	A1
20140173041	Newton et al.	Jun 2014	A1
20140173042	Newton et al.	Jun 2014	A1
20140173043	Varney et al.	Jun 2014	A1
20140173044	Varney et al.	Jun 2014	A1
20140173045	Crowder et al.	Jun 2014	A1
20140173046	Crowder et al.	Jun 2014	A1
20140173047	Crowder et al.	Jun 2014	A1
20140173048	Crowder et al.	Jun 2014	A1
20140173052	Newton et al.	Jun 2014	A1
20140173053	Varney et al.	Jun 2014	A1
20140173054	Varney et al.	Jun 2014	A1
20140173061	Lipstone et al.	Jun 2014	A1
20140173062	Lipstone et al.	Jun 2014	A1
20140173064	Newton et al.	Jun 2014	A1
20140173066	Newton et al.	Jun 2014	A1
20140173067	Newton et al.	Jun 2014	A1
20140173077	Newton et al.	Jun 2014	A1
20140173079	Newton et al.	Jun 2014	A1
20140173087	Varney et al.	Jun 2014	A1
20140173088	Varney et al.	Jun 2014	A1
20140173091	Lipstone et al.	Jun 2014	A1
20140173097	Newton et al.	Jun 2014	A1
20140173115	Varney et al.	Jun 2014	A1
20140173131	Newton et al.	Jun 2014	A1
20140173132	Varney et al.	Jun 2014	A1
20140173135	Varney et al.	Jun 2014	A1
20140173338	Arroyo et al.	Jun 2014	A1
20140181171	Dourbal	Jun 2014	A1
20140188780	Guo et al.	Jul 2014	A1
20140222946	Lipstone et al.	Aug 2014	A1
20140222977	Varney et al.	Aug 2014	A1
20140222984	Varney et al.	Aug 2014	A1
20140223002	Varney et al.	Aug 2014	A1
20140223003	Varney et al.	Aug 2014	A1
20140223015	Varney et al.	Aug 2014	A1
20140223016	Varney et al.	Aug 2014	A1
20140223017	Lipstone et al.	Aug 2014	A1
20140223018	Varney et al.	Aug 2014	A1
20140235487	McDevitt et al.	Aug 2014	A1
20140280144	Heit et al.	Sep 2014	A1
20140280145	Heit et al.	Sep 2014	A1
20140337461	Lipstone et al.	Nov 2014	A1
20140344413	Lipstone et al.	Nov 2014	A1
20150020043	Ravindran et al.	Jan 2015	A1
20150049634	Levchuk et al.	Feb 2015	A1
20150057948	Reid et al.	Feb 2015	A1
20150149879	Miller et al.	May 2015	A1
20150154269	Miller et al.	Jun 2015	A1
20150163097	Lipstone et al.	Jun 2015	A1
20150167085	Salomon et al.	Jun 2015	A1
20150176080	Zucman-Rossi et al.	Jun 2015	A1
20150180724	Varney et al.	Jun 2015	A1
20150180725	Varney et al.	Jun 2015	A1
20150180971	Varney et al.	Jun 2015	A1
20150186789	Guo et al.	Jul 2015	A1
20150193583	McNair et al.	Jul 2015	A1
20150207695	Varney et al.	Jul 2015	A1
20150254331	Long et al.	Sep 2015	A1
20150269244	Qamar et al.	Sep 2015	A1
20150286759	Rehtanz et al.	Oct 2015	A1
20150363215	Versteeg et al.	Dec 2015	A1
20160013773	Dourbal	Jan 2016	A1
20160034640	Zhao et al.	Feb 2016	A1
20160034809	Trenholm et al.	Feb 2016	A1
20160085754	Gifford et al.	Mar 2016	A1
20160098519	Zwir	Apr 2016	A1
20160103932	Sathish et al.	Apr 2016	A1
20160171391	Guo et al.	Jun 2016	A1

Related Publications (1)

	Number	Date	Country
	20160364469 A1	Dec 2016	US

Provisional Applications (1)

	Number	Date	Country
	61087168	Aug 2008	US

Continuations (4)

	Number	Date	Country
Parent	14672430	Mar 2015	US
Child	15186063		US
Parent	14217939	Mar 2014	US
Child	14672430		US
Parent	13628559	Sep 2012	US
Child	14217939		US
Parent	12538835	Aug 2009	US
Child	13628559		US

System and method for probabilistic relational clustering

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications

Disclaimer

Term Extension

Abstract