This invention is generally directed to a method of measuring centrality of nodes of interest of a graph and ranking the nodes of interest based upon the centrality measure.
Graphs including nodes and edges are commonly used to represent very large data sets wherein the nodes represent data points and the edges connecting these nodes represent relationships between the data points. The following passage gives conventional notation and descriptions of basic properties of graphs that will be used going forward.
A node is also called a vertex, so the terms are used interchangeably. Let G=(V, E) denote a simple, undirected graph with a set of vertices V and a set of edges E, where n=|V| is the number of vertices and m=|E| is the number of edges. Recall that vertical bars surrounding a set denotes the cardinality or size of that set. The notation (h, ν) means there is an edge between vertex h and v. Using set builder notation, the adjacency set of a vertex v is denoted by N(ν)={h ∈V|(h, ν) ∈E}. This adjacency set N(ν) is also known as the neighborhood of vertex v, hence any vertex h ∈N(ν), meaning h is in N(ν), is a neighbor of v and so there exists an edge (h, ν). The degree of a node in a graph is the number of nodes connecting to it, meaning the number of neighbors of that node. Then d(ν)=IN(ν)I denotes the degree of vertex ν. The closed neighborhood of a vertex ν is denoted by N+(ν)=N(ν) U {v}, meaning the neighborhood set includes ν. A triangle in a graph is a set of three vertices {u, ν, w} such that there is an edge between each pair, hence there are edges (u, ν), (u, w), (ν, w). A triangle is the simplest non-trivial, fully-connected graph, also known as a clique. The total possible number of triangles in a graph is given by the binomial coefficient
which is colloquially stated as “n choose 3” to mean the total number of unique combinations of three items. Let Δ(ν) represent the number of triangles with v. This is known as the local triangle count of a vertex. The global triangle count is the total number of triangles in G and is denoted by Δ(G). The subset of neighbors of v that are in triangles with v is given by NΔ(ν). Such neighbors are called triangle neighbors. The non-triangle neighbors of a vertex ν are those neighbors that are not in triangles with v, the subset of which is given by N▪(ν). The neighborhood of v is hence the union of triangle and non-triangle neighbors, N(ν)=NΔ(ν) U N▪(ν). It follows that N▪(ν)=N(ν)NΔ(ν).
In graph theory and network analysis, measures of centrality identify the most important or influential vertices in a graph. Common centrality measures include degree, eigenvector, PageRank, and betweenness.
Degree centrality provides a measure of the number of connections associated with a node. This measure is very simple since degree centrality depends only on the degrees (i.e., the number of connections associated with the node), however, this simplicity also means that degree centrality may not accurately model importance in a complex network.
Eigenvector centrality provides a measure of the number and quality of connections associated with a node. Important nodes under this measure are those with many connections to other important nodes. The advantage of this measure is it gives more weight to connections from important nodes than those that are not important. The disadvantage is it ignores longer range interactions because only direct connections of a node are considered in an eigenvector centrality measure. An eigenvector centrality measure is also relatively expensive to compute because it takes 0(n3) time to compute eigenvectors.
PageRank centrality provides a measure of the number and quality of connections associated with a node and also includes a dampening factor. The page rank algorithm scores the importance of vertices based on both their quantity and quality of links. PageRank is a variant of eigenvector centrality and therefore has similar advantages and disadvantages. It is possible that a node with many low-quality connections may still be considered high-ranking using a PageRank centrality measure due to the nature of having high degree (i.e., many nodes connecting to it), making it susceptible to spurious results or “gaming” of the ranks. The PageRank algorithm was published by Page and Brin in 1998 and made popular by its use for ranking websites for Google Search.
Betweenness centrality provides a measure of the number of shortest-paths a node intersects. The betweenness centrality algorithm calculates the fraction of shortest-paths a vertex intersects over all possible shortest-paths. This implies that an important or central vertex in a graph under a betweenness centrality measure is one whose removal will disrupt the flow of information to many other vertices. The betweenness centrality measure takes into account longer range interactions that are not considered in PageRank and other eigenvector-based centrality measures. Although a betweenness centrality measure relies heavily on distances, it does not capture local subnetwork characteristics. For example, if a graph has two highly connected subgraphs that are bridged by a single node, then that single node is important under a betweenness centrality measure despite having the fewest number of connections. It therefore is less suitable for identifying important vertices that have many connections within the subgraphs. The betweenness centrality measure is also very expensive to compute because it requires finding the shortest-paths between all pairs of vertices, which can take 0(mn) time.
In many instances, these centrality measures do not successfully identify important nodes in networks that have strong social network properties and organizational hierarchies where both direct and indirect connectivity needs to be accounted. A key property of these social/hierarchical networks is the presence of many triangles. For example, the Friendster social network dataset (on-line gaming network), available at the Stanford Network Analysis Project (http://snap.stanford.edu/data), is a graph of over 65 million vertices but has over 4 billion triangles. In social network analysis, triangles have been found to indicate network cohesion and clustering. If there is a social network in which every pair of nodes are connected, thus the graph is a clique, then any individual can contact any other individual in a single step. This represents maximum cohesion, and the graph would have the maximum number of triangles possible, e.g., there would be ( ) triangles.
In a social network, a pair of nodes that share a common neighbor implies a minimal degree of mutual association between the pair of nodes. But if that same pair are themselves connected, then there is much stronger cohesion between all nodes in that triangle. Consider an email spammer that sends thousands of emails to random individuals. Most any pair of the email recipients likely have no other association aside from having received an email from the spammer. This should not imply that the spammer is important, especially with respect to each of the recipients. Now consider a social network. A message from an individual to all of his or her direct contacts has more relevance than a message from the email spammer because the recipients are mutually associated rather than just random individuals receiving an email. Hence, most any pair of the friend recipients have some existing association before receiving the same email. This suggests that the email recipients were not just randomly selected for an email message. Current centrality measures do not account for these relationships between the neighboring nodes.
Hierarchies may also have many triangles. The top of the hierarchy in an organization may have very few contacts, but the subordinate contacts are likely connected because they must share similar information or orders handed down by the top. In turn, these subordinates may have many mutual associates that are also members of triangles. Thus, the top node in the hierarchy could have few or no triangles, but if there are many triangles concentrated around its direct contacts and the contacts of the direct contacts, it suggests that top node is important in the network. For example, if the top node in an organizational hierarchy is not associated with any triangles, but a direct contact of the top node is associated with many triangles then that direct contact is in effect conferring the support of its triangle members to the top node; i.e., the top node is receiving support from nodes that are two steps away. Thus, importance of a node is due to the concentration of triangles that surround the node even when the node itself is not involved in many triangles. This indirect support is not modeled by centrality measures such as betweenness and closeness centrality.
Another consideration is the relationship of neighboring nodes. As an example, a first and second node may share a mutual contact. If the first and second nodes are themselves connected, a stronger relationship with the mutual contact exists than the relationship with the mutual contact if the first and second nodes are not themselves connected. In a real-world scenario, this could mean that the first and second nodes have conferred and agreed upon the selection of the mutual node as a contact. Centrality measures such as PageRank and Degree centrality do not account for this indirect connectivity (i.e., the connectivity between the first and second nodes). Rather, in PageRank and Degree centrality measures, the importance of a node depends upon the connection between a node and its neighbors regardless of whether connections exist amongst neighbors. Thus, the email spammer described earlier can incorrectly appear as an important node in the Page Rank and Degree centrality measures.
A new centrality measure that accounts for the concentration of triangles in the local subgraph of each vertex is therefore needed in order to successfully identify important nodes.
Briefly, the present invention discloses methods for measuring triangle graph centrality and methods for ranking nodes of a graph based upon the triangle graph centrality measures. The nodes of the graph represent data elements in a network and the measures of triangle graph centrality are used to identify the most important/influential nodes (i.e., the most important or influential data elements) in the network.
The organization and manner of the structure and operation of the invention, together with objects and advantages thereof, may best be understood by reference to the following description, taken in connection with the accompanying drawings, wherein like reference numerals identify like elements in which:
While the invention may be susceptible to embodiment in different forms, there is shown in the drawings, and herein will be described in detail, a specific embodiment with the understanding that the present disclosure is to be considered an exemplification of the principles of the invention, and is not intended to limit the invention to that as illustrated and described herein.
Often the nodes and edges within a graph form triangles. The presence of many triangles within a graph suggest that there are strong or close-knit ties between the data points; that there are mutual associations between the data points; and that data points represent a community structure. The invention provides triangle graph centrality measures based on the following principals/assumptions: (1) a vertex connected to neighbors having high triangle counts is important; (2) if many vertices with mutually-connected neighbors are also connected to a common vertex, that common vertex is important; and (3) influence more easily spreads in networks with many triangles because there are more edges and therefore pathways to reach all nodes. In addition, importance is due to the concentration of triangles that surround a node but that node itself need not be involved in many triangles. The invention further provides a method of ranking connected nodes of a graph G based upon the triangle graph centrality measures. The nodes of the graph G represent data elements in a network and the measure of triangle graph centrality is used to identify the most important/influential nodes (i.e., data elements) in the network.
A clique is a set of vertices in a graph wherein each pair of the vertices of the set is joined by an edge. A clique represents the strongest cohesion of vertices in a graph. Thus any vertex in a clique can quickly contact or influence every other vertex because of the proximity of being one step away. A clique also has the maximum number of triangles possible among the vertices in the clique. A clique of n vertices has ( ) triangles. For example, a clique 2 having ten vertices 4, is illustrated in
An example graph (G) 10 representing a very small data set is illustrated in
The triangle graph centrality measures of the present invention are centered on various triangle counts and various prior art tools may be used to determine particular triangle counts. For example, Neo4j (www.neo4j.com) and Networkx (networkx.github.io) are tools which may be used to determine global and local triangle counts. Various triangle counts are defined below as follows:
The following dictionary summarizes the various neighborhood triangle sums that will be used throughout this discussion.
Various triangle graph centrality measures are provided by the invention to measure the importance/significance of a node of interest within a graph. Specifically, neighborhood triangle centrality is illustrated in
The neighborhood triangle centrality measure based on a neighborhood triangle sum, Sh, illustrated in
A method 300 of measuring neighborhood triangle centrality is illustrated in
Next at step 304, a node of interest, v, for which the neighborhood triangle centrality is to be measured is selected. As an example, we will discuss the neighborhood triangle graph centrality measure for selected node of interest, V1 of
Next, at step 306 the neighboring nodes, h, of the node of interest are identified. In
Next at step 310, a neighborhood triangle sum is determined. As represented by steps 310a and 310b, the neighborhood triangle sum may be either a closed neighborhood triangle sum, Sh+, provided by Σh∈N+(˜)Δ(h), or the neighborhood triangle sum may be an open neighborhood triangle sum, Sh, provided by Σh∈N(ν)Δ(h). The closed neighborhood triangle sum, Sh+, is determined by summing the triangle counts associated with the node of interest and the triangle counts associated with each neighboring node h. The open neighborhood triangle sum, Sh, is determined by summing the triangle counts associated with each neighboring node h. For example, using V1 as the node of interest, the closed neighborhood triangle sum is:
Alternatively, using V1 as the node of interest, the open neighborhood triangle sum is:
Next at step 312, the neighborhood triangle sum may optionally be normalized. Although other methods of normalization may be used, two methods of normalization are illustrated and described herein.
As illustrated at step 314a, the neighborhood triangle sum may be normalized using the total possible neighborhood triangle sum. Specifically, the sum of the possible triangle counts for all neighbors is calculated. Since the possible triangle count for any vertex v is given by its “degree choose 2” binomial coefficient, i.e.,
the total possible neighborhood triangle sum for a closed neighborhood is,
For example, using V1 as the node of interest and summing in order V1, V2, V3, V4, V5, the total possible closed neighborhood triangle sum is:
The normalized closed neighborhood triangle sum Sh+ is therefore provided by the closed neighborhood triangle sum divided by the total possible closed neighborhood triangle sum:
Using node V1 of the
Alternatively, the total possible neighborhood triangle sum for an open neighborhood is:
For example, using V1 as the node of interest and summing in order V2, V3, V4, V5, the total possible closed neighborhood triangle sum is:
The normalized open neighborhood triangle sum Sh is therefore provided by the open neighborhood triangle sum divided by the total possible open neighborhood triangle sum:
Using node V1 of the
As illustrated at step 314b, the neighborhood triangle sum may be normalized using the global triangle count. Using the simple graph 10 of
For example, using node V1 of the
Alternatively, the normalized open neighborhood triangle sum Sh is provided by the open neighborhood triangle sum divided by the global triangle count:
Using node V1 of the
In yet another alternative, the neighborhood triangle sum may be normalized using a clustering coefficient, either inclusive or not inclusive of v. The clustering coefficient cc(ν) for a vertex v is given by,
The neighborhood triangle sum normalized using the clustering coefficient is then,
CNTSN(ν)=Σh∈N+(ν)cc(ν) or
ONTSN(ν)=Σh∈N(ν)cc(ν)
In this form, the contribution from each vertex is locally normalized by its total possible triangles rather than normalizing the contribution across the sum of possible triangle counts from all neighbors. In the case where all neighbors are in separate cliques and therefore have nearly the maximum possible clustering coefficient, their contribution to the rank of a vertex is primarily independent of the other neighbors. Dividing this version of neighborhood triangle sum by d(ν)or d(ν)+1 depending on the inclusion choice helps to standardize the rank by how many neighbors the vertex v of interest may have.
As illustrated in
At step 318 the triangle neighborhood sums/normalized triangle neighborhood sums are compared and the nodes of interest are ranked.
At step 320 a list of ranked nodes is provided.
The method 400 of measuring core triangle centrality and ranking nodes is illustrated in
Next at step 404, a node of interest for which core triangle centrality is to be measured is selected. For example, we will discuss the core triangle centrality measure for selected node of interest V1 of
Next at step 406, the neighboring nodes of the node of interest are identified and each neighboring node is categorized as triangle-neighbor, u, or a non-triangle-neighbor, w. In
Next at step 412, the core neighborhood triangle sum is determined. The core-neighborhood triangle sum is the sum of the local triangle counts for the triangle-neighbors u. As illustrated at step 412a, the core-neighborhood triangle sum may be a closed core-neighborhood triangle sum, Su+:
For example, using V1 as the node of interest the closed core neighborhood triangle sum Su+, is:
Alternatively, as illustrated at step 412b, the core-neighborhood triangle sum may be an open core triangle neighborhood sum, Su:
Σu∈NΔ(ν)Δ(u).
For example, using V1 as the node of interest the open core triangle-neighborhood sum is:
Next, at step 414, the core-neighborhood triangle sum is weighted, “weighted Su”. For example, the core-neighborhood triangle sum may be mitigated or enhanced. In the case of mitigation, for example, to avoid over-counting from triangle neighbors, mitigation may be provided by dividing the core-neighborhood triangle sum by three since the same triangle is counted thrice. Thus, a mitigated closed core-neighborhood triangle sum may be provided by:
Alternatively, a mitigated open core-neighborhood triangle sum is provided by:
Although mitigation has been described as dividing the core-neighborhood triangle sum by three, other means of mitigation are permissible. For example, dividing by another number. Still other means of mitigating the over-count is permitted such as dividing by three for only the triangles that are incident upon neighbor u and the node of interest since u may have triangles that do not involve the node of interest.
Weighting of the core-neighborhood triangle sum may also be provided by enhancing the core-neighborhood triangle count. For example, contribution from triangle neighbors u can be given more weight by multiplying the core-neighborhood triangle sum by some number greater than one.
Next at step 416, the non-triangle neighborhood triangle sum, Sw, is determined. Specifically, the local triangle counts of non-triangle-neighbors, w, are summed to provide a non-triangle-neighborhood triangle sum, Sw. The non-triangle neighborhood triangle count is:
Σw∈N▪(ν)Δ(w)
Recall that N▪(ν)=N(ν)NΔ(ν) denotes the set difference between sets N(ν) and NΔ(ν), leaving the set of neighbors of v that are in set N(ν) but not in the set NΔ(ν); i.e., the non-triangle neighbors, w. For example, using V1 as the node of interest the non-triangle neighborhood triangle sum is:
At step 418, core triangle centrality is calculated by summing the weighted core-neighborhood triangle sum, weighted Su, (either closed or open) and the non-triangle neighborhood triangle sum, Sw. Using a weighted closed-core-neighborhood triangle sum where weighting is denoted by X, for example, the core triangle centrality (CTC) is provided by:
Optionally, at step 420, core triangle centrality (CTC) for each node of interest may be normalized by dividing the core triangle centrality for each node of interest by the global triangle count to provide a normalized core triangle centrality. For example, a normalized core triangle centrality measure (CTCN) is provided by:
Using node V1 of the
As noted at step 422, steps 404-420 are performed for each node of interest selected. Although steps 404-420 may be performed sequentially for each node of interest, preferably steps 404-420 are performed simultaneously and independently for each node of interest selected. In any computation, sequential or parallel, the total number of steps is always a constant regardless of the size of the graph or number of triangles. When steps 404-420 are simultaneously performed for each of the selected nodes, the entire computation for all vertices may be completed in three rounds; one round for computing all triangle counts, a second round for computing the weighted core-neighborhood triangle sums and, finally, a third for computing core triangle centrality.
At step 424, the core triangle centrality measure (or optionally the normalized core triangle centrality measure) for each node of interest is compared to the remaining core triangle centrality measures and each node is ranked For example, the core triangle centrality measures are ordered from the highest value to lowest value and the node associated with the highest value is identified as the most relevant node/vertex.
At step 426 a list of ranked nodes is provided.
A simple graph 500 of
Observing graph 500 and applying the principles described above (1) a vertex connected to neighbors having high triangle counts is important; (2) if many vertices with mutually-connected neighbors are also connected to a common vertex, that common vertex is important; and (3) influence more easily spreads because the presence of triangles coincides with a denser network, meaning more pathways for influence. It is observed that node a should be the highest rank node of the graph. As illustrated in Table A, however, only core triangle centrality and betweenness centrality correctly provide node a with the highest ranking.
A simple graph 600 of
Observing graph 600 and applying the principles described above (1) a vertex connected to neighbors having high triangle counts is important; (2) if many vertices with mutually-connected neighbors are also connected to a common vertex, that common vertex is important; and (3) influence more easily spreads because the presence of triangles coincides with a denser network, meaning more pathways for influence. It is observed that node a should be the highest rank node of the graph. As illustrated in Table B, however, only core triangle centrality and eigenvector centrality correctly provide node a with the highest ranking.
A simple graph 700 of
Observing graph 700 and applying the principles described above (1) a vertex connected to neighbors having high triangle counts is important; (2) if many vertices with mutually-connected neighbors are also connected to a common vertex, that common vertex is important; and (3) influence more easily spreads because the presence of triangles coincides with a denser network, meaning more pathways for influence. It is observed that node a should be the highest rank node of the graph. As illustrated in Table C, however, only triangle centrality correctly provides node a with the highest ranking.
As illustrated in Tables A-C, core triangle centrality measures find important vertices that are missed by other centrality measures. Unlike other centrality measures, core triangle centrality recognizes the importance of a node based upon mutual connections with well-connected neighbors.
In addition to the ability to find important nodes not found using other graph centrality measures, triangle graph centrality provides a fast runtime with a constant number of steps. The computations for triangle graph centrality are non-iterative and therefore may be directly computed thereby decreasing the time required to rank the nodes of interest in comparison with other centrality-based rankings.
The time complexity of triangle graph centrality is asymptotically equivalent to triangle counting. Recall the time complexity refers to the work as a function of the input size. It relates to the performance or runtime. The asymptotic bounds for time complexity are written using Landau notation, colloquially known as “Big-Oh” notation. Hence if the worst-case time complexity is quadratic with respect to the input size n, then it is written as 0 (nz) time, where O represents the asymptotic upper-bound, ignoring constants.
A first stage in computing core triangle centrality is to get the local triangle count of every vertex in the graph. This step can be achieved by any triangle counting algorithm and takes 0(m3/2) time; recall m represents the number of edges in the graph. The processes of determining graph centrality described in connection with
The second stage in measuring core centrality is to identify the open or closed core-neighborhood triangle sum for each vertex. Note, as discussed below, this step can be skipped for the neighborhood centrality measure. Recall the core-triangle neighborhood triangle sum of a vertex is the sum of the triangle counts of the triangle neighbors of the selected vertex, meaning only the neighbors that participate in a triangle with that vertex are included in the triangle-neighborhood triangle sum.
The triangle-neighborhood triangle sum is computed using the following process. Let a vertex u be higher-ordered than a vertex v if d(u)>d(ν) or in the case of a tie, u>v meaning the label of u is greater than the label of v.
For every higher-ordered neighbor h of a vertex v, check if there is at least one common neighbor between h and v. This is accomplished by checking if there is an (h, x) edge for any neighbor x of v. If we find just one common neighbor we can stop checking because then we know that h, v are triangle neighbors. The common neighbor check is only needed if h is higher-order than v. The reason is because a vertex cannot have more than √{square root over (m)} neighbors having higher degree (if d(ν)≥√{square root over (m)} and each of v's high degree neighbors u also have d(ν)≥√{square root over (m)}, then there would be more than m edges in the graph which is not possible). Therefore we perform 0(√{square root over (m)} d(ν)) checks for every vertex. Checking if there is an (h, x)∈E edge is possible in 0(1) time using a suitable data structure such as a matrix, or if space is a concern, then an array or hash table of edges, or each neighborhood can be stored as a hash table so an equivalent check is determining if h is in the neighborhood set of x. In total, finding the triangle neighbors of every vertex is possible in
time because each vertex compares its neighborhood set N(ν) no more than √{square root over (m)} times for the common neighbor checks, and each N(ν) has size d(ν). Thus finding triangle neighbors for every vertex takes the same time as counting triangles.
Continuing in this second stage of finding the core-neighborhood triangle sum for each vertex, once a triangle neighbor u of v is determined, we update another array indexed by vertex labels with the triangle neighborhood counts for u and v. The effect of this is then each lower-ordered vertex in a triangle will give its triangle count to the higher-ordered vertices in that triangle. Thus at the end of this step, each vertex will have the sum of triangle counts from its triangle-neighbors, as desired. This array is titled the “core-neighborhood triangle sum array”. Hence, when checking all u neighbors of v having higher degree (or higher label in case of a tie), for each u that is a triangle neighbor we add the triangle count of v to the existing count in the core-neighborhood triangle sum array indexed at u, and then update the count indexed at v with the triangle count of u. Each update in this core sum array takes 0(1) time because looking up the triangle count in the previous “triangle count” array takes 0 (1) time and adding two numbers and updating the result in the core sum array also takes 0(1) time.
Next in this second stage, the core-neighborhood triangle sum is weighted. Weighting is a simple arithmetic operation and takes 0(1) time, hence weighting is free.
Now in the third stage, for each vertex v add the triangle counts of every neighbor u of v. Subtract from this sum the value for v stored in the core-neighborhood triangle sum array described previously, to get the final triangle summation from non-triangle neighbors. Now add the triangle count of v to its core-neighborhood triangle sum value to get the final, closed triangle-neighborhood sum for v. Divide this core sum by three, add to it the non-triangle neighbor triangle sum, and divide by the total count of triangles in the graph to get the normalized triangle centrality of v. Performing these operations for every vertex takes the same time as summing over the degrees of every vertex and hence takes 0(m) time. This runtime is less than the time to count triangles and therefore can be ignored. The overall time to compute the global triangle centrality is therefore asymptotically equivalent to the time to count triangles, hence O(m3/2) time.
Table D below provides a summary of the runtime associated with measuring global triangle centrality and other graph centrality measures including PageRank, eigenvector, betweenness, and degree centrality.
Both PageRank and eigenvector centrality compute the eigenvalues. Using the Power Iteration method, this takes 0(n3) time. Given a sparse graph where m is on order of n, then triangle centrality takes 0 (m3/2)=0(n1.5) time, which is considerably faster than the time needed to compute the eigenvalues used by PageRank and eigenvector.
The runtime of betweenness centrality is 0((mn) time because it must compute all the shortest-paths. Computing the shortest-paths from a vertex to every other vertex can be found using Breadth-First Search (BFS), which takes 0((m+n) time. Hence from every vertex, all shortest-paths and therefore betweenness centrality, can be computed in O(n(m+n))=O(mn) time. This is slower than computing triangle centrality on sparse graphs because it takes 0 (n2) time as opposed to 0(n1.5) time for triangle centrality.
Finally, the time complexity of degree centrality is equivalent to summing all the degrees in the graph, which takes 0(m) time. Thus, triangle centrality is a factor of N§ slower than computing degree centrality.
Thus, as illustrated in Chart D, the time required to calculate triangle centrality is faster than the time required to calculate PageRank centrality or eigenvector centrality on sparse graphs, but at worst it takes the same time. The time required to calculate triangle centrality is faster than the time required to calculate betweenness centrality by a factor of √{square root over (n)}, on sparse graphs, and the time required to calculate degree centrality is faster than the time required to calculate triangle centrality by a factor of √{square root over (m)}. Although more time is needed to calculate triangle centrality than the time needed to calculate degree centrality, as illustrated in Table A-C, triangle centrality consistently and accurately identifies important nodes whereas degree centrality does not consistently and accurately identify important nodes. Although the time needed to calculate triangle centrality is the same as the time needed to calculate PageRank and eigenvector centrality in the worst case, for sparse graphs it is faster, and as illustrated in Table A-C, triangle centrality consistently and accurately identifies important nodes whereas PageRank and eigenvector centrality do not consistently and accurately identify important nodes.
As noted above, neighborhood triangle centrality and core triangle centrality each provide advantages over the prior art centrality measures. For example, the ability to find important nodes not found using other graph centrality measures; a fast runtime with a constant number of steps; and non-iterative computations that may be directly computed. The selection of the particular graph triangle centrality measure to be used (neighborhood triangle centrality or core triangle centrality) will depend upon the characteristics of the particular data set, whether weighting of the triangle neighborhood is desired, and whether the additional run time required for core triangle centrality is acceptable to the user.
While embodiments of the present invention are shown and described, it is envisioned that those skilled in the art may devise various modifications of the present invention without departing from the spirit and scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
6285999 | Page | Sep 2001 | B1 |
6799176 | Page | Sep 2004 | B1 |
7058628 | Page | Jun 2006 | B1 |
7269587 | Page | Sep 2007 | B1 |
9760619 | Lattanzi | Sep 2017 | B1 |
20210000369 | Luksic | Jan 2021 | A1 |