Ser. No. 13/317,270, “A method for calculating distances between users in a social graph”, Oct. 13, 2011, pending, Zhijiang He
Not Applicable
Not Applicable
Not Applicable
The present invention relates generally to techniques of search in multiple social graphs. Some nodes in one social graph may be mapped to nodes in other social graphs. More specifically, it calculates the proximities between nodes in multiple social graphs.
Due to the number of users and the amount of time a user may spend everyday, online social networking has been becoming increasingly important in people's life. Large databases of social connections, i.e. social graphs, have been established. A node in a social graph represents a user or an entity. A connection between two nodes represents a relation between the two corresponding users/entities.
There are a variety of social graphs. For instance, the social graphs of Facebook and Google+ represent friendship between users. The social graph of LinkedIn represents professional links between users. The social graph of Twitter represents following relations between users.
Social graphs representing same type of relations may be merged into one social graph. For instance, the social graphs of Facebook and Google+ may be merged into one friendship graph, which describes the friendship between users more accurately and completely.
Social graphs may also be used to describe business relations. A medical social graph may describe relations between doctors and patients. A commercial social graph may describe relations between buyers and sellers.
The relations in popular social graphs are dense. A node in a popular social graph may have hundreds of connections. According to the 6 degrees of separation, there may be on average 5 users between any two users in the social graph of a popular social networking service.
Nonetheless, some social graphs are sparse. For instance, a seller may have a limited number of buyers. Conversely, a buyer may also only have a limited number of choices of sellers. It is a challenge for both buyers and sellers to find more choices. Sellers may use various approaches to connect to potential buyers. Meanwhile, buyers also want to have more choices for better products and services.
In real life, a buyer may ask his/her friends for referral of sellers. His/her friends may ask their friends for referral. Furthermore, to sell more products/services, a seller may also ask customers to recommend products/services to customers' friends. In this real life example, relations in a friendship social graph may be used to find possible new business relations in a business social graph.
This phenomenon serves as the foundation for search in multiple social graphs. The goal of a search in multiple social graphs is to find a list of matched nodes using relations in the social graphs.
Common methods for search in social graphs include breath-first or similar approaches. Unfortunately, a node in social graphs may have hundreds of connections. The large branching factor may dramatically increase the computation cost. This problem is particularly true for search in multiple social graphs.
To handle the large branching factor problem, the nodes in social graphs may be sorted in terms of closeness of relation with respect to the source nodes. Nodes with closer relation to the source nodes are searched first. Moreover, the scope of the search may also be constrained.
Proximities may be used to describe the closeness of relation between nodes in social graphs. To calculate proximities, weighting factors are assigned to relations in the social graphs. The proximities between two nodes in social graphs may be calculated from the weighting factors for relations on the paths connecting the two nodes in the social graphs.
Accordingly, it is an object of this invention to calculate the proximities between nodes in multiple social graphs to facilitate search in multiple social graphs.
The present invention provides a method for calculating proximities between nodes in multiple social graphs. The relations in a social graph may have distinct importance. Therefore, weighting factors are assigned to the relations in a social graph. The proximities between nodes describe the closeness of relation in the social graphs. Larger proximity from one node to the other means closer relation between the two nodes. The proximities between nodes may be calculated from the weighting factors for relations on the paths connecting the two nodes. If two nodes have no path connecting them, then the proximity between them is zero. According to the calculated proximities from one or more source nodes, search in social graphs may be performed in the order of non-increasing proximities from the source nodes.
A person in real life may know hundreds of people. Nonetheless, he/she may have close relations with only very few of them. His/her relations with the remaining friends may be relatively looser. In other words, a person's friends are tiered. This is also true for the relations of a node in social networking. This phenomenon serves as the theoretical foundation for calculating proximities between nodes in social graphs. If two nodes have close direct/indirect relation, the proximities between the two nodes are also large. The concept of proximity makes it possible to measure the closeness of relation across neighbors in social graphs.
A search in multiple social graphs may be performed in the order of non-increasing proximities from the source nodes. The search scope may be constrained with a predetermined cutoff proximity. Nodes with smaller proximities from the source nodes than the predetermined cutoff proximity will not be searched.
Social graphs representing same type of relations may be merged into one graph. The importance ranks for nodes and the weighting factors for relations in the merged graph may be derived from the importance ranks for nodes and the weighting factors for relations in the original graphs. In this way, the merged graph may model the relations between nodes more accurately and completely.
When propagated along a path in social graphs, some relations may be attenuated. Relations having this propagation attribute are defined as attenuatable relations. For instance, relations in a friendship social graph are attenuatable. A propagated friendship between two strangers sharing a common friend may not be as close as their friendship with their common friend. In other words, the proximities between two strangers sharing a common friend may be smaller than their proximities from their common friend.
Nonetheless, some relations may not be attenuated when propagated along a path. These relations are defined to be non-attenuatable relations. For instance, the relation between a customer and a restaurant is non-attenuatable. A customer may have a high opinion about a restaurant. Upon the customer's recommendation, his/her friends who have never visited the restaurant may also have a high opinion about the restaurant. The relation between the customer and the restaurant is not attenuated when propagated to his/her friends. That is, the proximities between the customer's friends and the restaurant may be equal to the proximities between the customer and the restaurant.
Proximities between two nodes in social graphs may be calculated from the weighting factors for relations on the paths connecting the two nodes. The methods for calculating proximities of non-attenuatable relation between two nodes are distinct from the methods for calculating proximities of attenuatable relation between two nodes.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent to one skilled in the art, however, that the present invention may be practiced without these specific details. Accordingly, the following embodiments of the invention are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
Given n undirected social graphs Gk(Vk, Ek), where kε[0, n−1]. Vk represents the set of nodes in Gk and Ek represents the set of edges connecting the nodes in Vk. Essentially, Vk is the set of users/entities in a social networking service and Ek describes the relations between the users/entities. Nodes in one graph Gi may be mapped to nodes in another graph Gj, where i, jε[0, n−1]. That is, Gi and Gj share some common nodes.
The multiple social graphs may be obtained from various social networking services. Nonetheless one social networking service may also have multiple social graphs. Each graph represents a type of relation between nodes.
Nodes in social graphs represent entities registered with social networking services, including but not limited to users, celebrities, public figures, artists, bands, groups, companies, businesses, organizations, institutions, places, events, brands, products and services.
Each node vi in a social graph G is assigned an importance rank ri. In one embodiment of the invention, an importance rank may be determined from a node's profile, join time, last access time, activities, locations, interests, membership of groups, events and preferences.
Part of the value of a social graph is the closeness of relation it conveys. Although a node may have hundreds of connections, the connections may carry disparate levels of closeness. In one embodiment of the present invention, family relation carries a high level of trust. In another embodiment of the invention, if there are more communications between two nodes, the relation between them may be closer as well.
To model the closeness of relation between nodes, the present invention assigns weighting factors to the relations in a social graph. For a relation eij in graph G(V, E), wij is used to describe the closeness of relation from vi to vj.
One embodiment of the present invention is shown in
From the perspective of probability, the weighting factors for attenuatable relations may be interpreted as a predetermined probability of selecting the next node from the current node's neighbors to traverse when searching a social graph. As the next node to visit is always one of vi's neighbors in a social graph, the sum of all weighting factors for relations sourced from vi is 1. That is,
Apparently, wij and wji are not necessarily equal. For this reason, the original undirected G(V, E) is converted to a directed graph G′(V, W), where an edge eij/eji in G is split into two directed edges wij and wji in G′.
wij may be obtained from the closeness of relation from vi to vj in a social graph. In one embodiment of the present invention, it may be derived from the communications between node vi and vj. In another embodiment of the present invention, it may be dependent on the nodes' importance rank ri and rj, which may be calculated from the nodes' profiles, join times, last access times, activities, locations, interests, membership of groups, events and preferences.
In one embodiment of the present invention, if there is no relation closeness information available, the weighting factor for the attenuatable relation from vi to vj in a social graph G may be calculated as
w
ij=1/n
where n is the number of relations node vi has in G.
Proximities between two nodes describe closeness of the two nodes in a social graph. If the proximity from one node to another is large, the relation between them is close too. Proximities may be calculated from the weighting factors for relations in the graphs. More specifically, the proximities between two nodes are determined from the weighting factors for relations on the paths connecting the two nodes.
There may be a number of paths from a first node to a second node. If the propagated relations between two nodes are attenuatable, path proximity may be defined to describe the propagated relations from the first node to the second node along a path. In one embodiment of the present invention, the proximity pij from node vi to vj is defined as
which is the maximum path proximity from vi to vj. ppijl is the proximity for path l. Path l is one of the paths connecting vi to vj.
Similar to the asymmetry of weighting factors, proximities are asymmetric as well. Specifically, proximity pij may not be equal to pji.
The proximity of a path may be calculated from the weighting factors for relations on the path. Moreover, the probability of visiting node vj from vi following a path should be the multiplication of the probabilities of connections on the path. Therefore, in one embodiment of the present invention, the path proximity ppo may be calculated as
pp
ijl
=Πw
st
where wst is the weighting factor for the relation from vs to vt on path l connecting vi to vj.
The propagation of attenuatable relation across neighboring nodes should be an attenuating process. A propagation coefficient α is defined and should be in the interval of [0, 1]. Accordingly, in one embodiment of the present invention, the path proximity ppijl may be defined as
pp
ijl
=Πw′
st
where w′st is equal to α*wst except for the last connection on the path. The w′st for the last connection on the path is equal to wst.
Given the 6 degrees of separation, a recommendation is to select α7=ε where ε is the truncation error of the method. For instance, if ε is 0.001, α would be 0.373.
One embodiment of the present invention is
The metric of social proximity may be count-intuitive. The largest path proximity between two nodes may not be the path proximity of the direct connection between the two nodes.
In one embodiment of the present invention, iterative deepening depth-first traversal may be applied on a source node. The depth limit for the iterative deepening depth-first traversal is a predetermined depth, for instance, 6. If the multiplication of weighting factors for relations on a path is smaller than a predefined truncation error ε, then the propagation along this path is stopped. Furthermore, the neighbors of a source node are visited in the order of non-increasing weighting factors.
When the proximities between nodes are available, search in social graphs may be conducted from source nodes in the non-increasing order of proximities from the source nodes. Nodes with larger proximities from the source nodes are searched first. The search may be stopped if the proximities from the source nodes are smaller than a predetermined cutoff proximity.
Moreover, distances between nodes in social graphs may be derived from the calculated proximities. In one embodiment of the present invention, the distance from a first node to a second node may be calculated as the reciprocal of the proximity from the first node to the second node.
Based on the calculated distances, clusters may be created from social graphs to enhance the performance of social search. Various clustering techniques may be used. In one embodiment of the present invention, density based clustering may be used. In another embodiment of the present invention, the hierarchical approaches may be used. The hierarchical clustering may be created in various ways. In one embodiment of the present invention, a hierarchy may be created in an agglomerative way. In another embodiment of the present invention, a hierarchy may be created in a divisive way.
Some social graphs may represent same type of relations. For instance, some users may have accounts in both Facebook and Google+. Accordingly, the Facebook graph GFacebook and the Google+ graph GGoogle+ share some common users. Moreover, as both GFacebook and GGoogle+ represent friendship between users, it is possible to merge these two graphs into one graph G′. The users' ranks and the relations' weighting factors in G′ may be calculated from the users' ranks and the relations' weighting factors in GFacebook and GGoogle+.
Social graphs representing same type of relations may be merged in various ways. Supposedly there are m graphs Gk, where kε[0, m−1]. The m graphs represent same type of relations. wijk is the weighting factor for the relation from vi to vj in Gk. In one embodiment of the present invention, the weighting factor w′ij for the relation from vi to vj in the merged graph G′ may be calculated as a weighted sum of the weighting factors wijk in the original graphs, where kε[0, m−1]. The weighting for relations sourced from vi in Gk is wwik. wwik represents the importance of Gk's relations from vi in the merged graph G′. In one embodiment of the present invention, wwik may be determined in terms of the communications sourced from node vi.
The un-normalized weighting factor w″ij for the relation from vi to vj in G′ may be calculated as
The normalized weighting factor w′ij for the relation in G′ may be computed as
where vl is one of the nodes having relations with vi in G′ including vj. w″il is the un-normalized weighting factor for the relation from vi to vl in G′. The denominator is a sum of all the un-normalized weighting factors for relations sourced from vi in G′.
User A has relations with B and D in G′. The un-normalized weighting factor for the relation from A to B w″AB=wwA0*wAB0+wwA1*wAB1=0.5*1.0+0.5*0.5=0.75. The un-normalized weighting factor for the relation from A to D w″AD=wwA1*wAD1=0.5*0.5=0.25. After normalization, w′AB=w′AB/(w″AB+w″AD)=0.75/(0.75+0.25)=0.75. w′AD=w″AD/(w″AB+w″AD)=0.25/(0.75+0.25)=0.25. Similarly, w′BA and w′BC may be calculated as 0.6 and 0.4 respectively.
As mentioned earlier, not all relations are attenuatable. If relations for a node in the graphs are not attenuatable, the weighting factors for the node's relations may not be interpreted from the probability perspective.
One example is given in
There is no relation from user C to restaurant R, which means user C may have never been to restaurant R. C may ask his/her friends A and B about restaurant R. In this way, C may get an opinion about restaurant R from A and B. Apparently, the restaurant and customer relation is not attenuatable. In the present invention, the propagation attribute of the restaurant and customer relation is defined to be non-attenuatable.
As stated previously, calculating proximities of non-attenuatable relation between nodes may be distinct from calculating proximities of attenuatable relation between nodes. In
Restaurant R may also use G0 and G1 to find possible new customers. Assuming R's opinions about A and B are wRA1 and wRB1 respectively, R's proximity of non-attenuatable relation to C, i.e. R's opinion about C, may be calculated as PRC1=(PCA0/(PCA0+PCB0))*PRA1+(PCB0/(PCA0+PCB0))*PRB1=(1.0/(1.0+0.187))*5+(0.187/(1.0+0.187))*4=4.842.
Assuming vx is a node with attenuatable relations and vy is a node with non-attenuatable relations, one embodiment of the present invention may calculate the proximity of non-attenuatable relation from vx to vy as
where vi is one of the nodes having non-attenuatable relations with vy and connected to vx by a path with all attenuatable relations on the path. piy is the proximity of non-attenuatable relation from vi to vy. pwxi describes the importance of piy in pxy. In one embodiment of the present invention, pwxi may be determined as
where pxi is the proximity of attenuatable relation from vx to vi. vj is one of the nodes having non-attenuatable relation with vy and connected to vx by a path with all attenuatable relations on the path. pxj is the proximity of attenuatable relation from vx to vj. The denominator is a sum of all the proximities of attenuatable relation from vx to the nodes having non-attenuatable relations with vy.
Similarly, in one embodiment of the present invention, the proximity of non-attenuatable relation from node vy to vx may be calculated as
p
yx=Σipwxi*pyi
where vi is one of the nodes having non-attenuatable relation with vy and connected to vx by a path with all attenuatable relations on the path. pyi is the proximity of non-attenuatable relation from vy to vi.
When proximities between nodes are calculated, search in social graphs may be performed in the order of non-increasing proximities from the source nodes. When presenting the search results, the matched nodes' information/URL links may be listed. The proximities from the source nodes to the matched nodes may be displayed. Moreover, the paths connecting the source nodes to the matched nodes with the maximum path proximities may also be displayed.
It should be noted that the present invention may be applied to one or more social graphs obtained from one or more social networking services.
The present invention has been disclosed and described with respect to the herein disclosed embodiments. However, these embodiments should be considered in all respects as illustrative and not restrictive. Other forms of the present invention could be made within the spirit and scope of the invention.