The embodiments herein generally relate to finding actionable communities within a social network, more particularly, to finding communities based on (a) the number of primitive communities a group of entities belong to, and (b) the number of products that the group of entities has purchased together. Cliques, dense groups, k-plexes are examples of primitive communities.
Finding communities in social networks is a well-studied topic. A community in social network is commonly being defined as a group of people whose interactions within the group are more than outside the group. It is believed that people's behavior can be linked to the behavior of their social neighborhood. However, it is not demonstrated that social communities are like-minded, i.e., they behave similarly in terms of their interest in items (e.g., movies, products, services).
Thus, there remains a need to find actionable communities within a social network based on (a) the number of primitive communities a group of entities belong to, and (b) the number of products that the group of entities has purchased together.
According to one embodiment, computer-implemented method that includes identifying a social network of a plurality of entities, where each entity is associated with at least one other entity in the social network. A first group of entities is identified from the plurality of entities that have expressed an interest in a first item. At least one first primitive community is identified from the first group of entities. An example of primitive community is clique which is being defined as a sub-group of the group of entities where all possible pairs of entities have an association between themselves. A second group of entities is identified from the plurality of entities that have expressed an interest in a second item. At least one second primitive community is identified from the second group of entities. Common sets of entities are identified within the at least one first primitive community of the first group and the at least one second primitive community of the second group based on a minimum support level, using a frequent itemset mining approach, where the primitive communities are treated as transactions and entities are treated as items. A union including all distinct entities is determined from the common sets of entities, and at least one community of entities is determined from the social network plurality of entities based on the union of all entities applied to the social network.
According to another embodiment, a computer-implemented method that includes identifying a social network of a plurality of entities, each entity being associated with at least one other entity in the social network. A first group of entities is identified from the plurality of entities that have purchased a first item, and at least one first primitive community is identified from the first group of entities. A second group of entities is identified from the plurality of entities that have purchased a second item, and at least one second primitive community is identified form the second group of entities. Common sets of entities are determined within the at least one first primitive community of the first group and the at least one second primitive community of the second group based on a minimum support level, using a frequent itemset mining approach, where the primitive communities are treated as transactions and entities are treated as items. A union including all distinct entities is determined from the common sets of entities, and at least one community of entities is determined from the social network plurality of entities based on the union of all entities applied to the social network.
According to another embodiment, a computer-implemented method that includes identifying a social network of a plurality of entities, each entity being associated with at least one other entity in the social network. A group of entities is identified from the plurality of entities that have expressed an interest in any one of a plurality of items. A primitive community for each of the groups of entities is determined for each distinct item of the plurality of items. Common sets of entities are determined within each primitive community associated with each distinct item of the plurality of items based on a minimum support level, using a frequent itemset mining approach, where the primitive communities are treated as transactions and entities are treated as items. A union including all distinct entities is determined from the common sets of entities, and at least one community of entities is determined from the social network plurality of entities based on the union of all entities applied to the social network.
According to another embodiment, a non-transitory computer storage medium readable by a computer tangibly embodying a program of instructions executable by the computer for performing a method that includes identifying a social network of a plurality of entities, each entity being associated with at least one other entity in the social network. A first group of entities is identified from the plurality of entities that have expressed an interest in a first item. At least one first primitive community is identified from the first group of entities. A second group of entities is identified from the plurality of entities that have expressed an interest in a second item. At least one second primitive community is identified from the second group of entities. Common sets of entities are identified within the at least one first primitive community of the first group and the at least one second primitive community of the second group based on a minimum support level, using a frequent itemset mining approach, where the primitive communities are treated as transactions and entities are treated as items. A union including all distinct entities is determined from the common sets of entities, and at least one community of entities is determined from the social network plurality of entities based on the union of all entities applied to the social network.
People who are commonly interested in a particular item are socially well connected. Motivated by this fact, an embodiment includes a method for finding communities wherein like-mindedness is an explicit objective. Finding small tight groups with many shared interests is accomplished by first finding primitive communities, then using a frequent itemset mining approach and using these frequent itemsets as building blocks to determine core memberships of like-minded (“actionable”) communities. The actionable communities can be extended by adding suitable members to the cores. These like-minding communities have higher similarity in their interests compared to (primitive) communities found using only the interaction information. These communities can be utilized for viral marketing since the participants are both like-minded as well as socially well-connected.
Forming communities is one of the fundamental traits of people in social networks. Many methods have been proposed and investigated for finding communities in social networks. These communities are based on the communication patterns in the social networks. A community in social network is defined as a group of people whose interactions within the group are more than outside the group. Communities have often been linked to common characteristics of the community members. In fact, quite often, the presence of a common characteristic among members of a community is taken as an indicator of good quality of the communities found. For example, the well known Zachary's karate-club and NCAA college football dataset are used as test of quality of the community finding algorithms, where the membership of the faction/conference is taken as the ground truth.
While shared characteristics, like language spoken by the majority of members of a community, have been used to evaluate the quality of found communities, they are not taken as input while determining communities. In this disclosure, items take an explicit input while determining the community structure. In addition, unlike the single attribute characterization of the solution, social network participants in multiple items are taken into account. The notion of like-mindedness for the communities is also formalized.
The importance of such communities from a practical point of view is significant, since the members of a community are both well connected and like minded. Hence, viral marketing strategies are more likely to succeed if initiated on such communities.
Recently, rich data has become available containing information about the social network as well purchase or like/dislike data for the social network participants. Examples may include information about viewers' rating about movies, as well as the friendship relation amongst the viewers. Another example where such data is available is telecom domain where call detail records (CDR) as well as purchase/subscription to value added services (VAS) are available.
It is believed that people's behavior can be linked to the behavior of their social neighborhood. The influence of social neighborhood is demonstrated in the churn behavior of telecom subscribers. People who are interested in an item are usually socially well connected.
Also, people liking (or disliking) an item together are even more socially connected. However, the information about the interests of the people is not used in community finding algorithms.
A community finding problem is defined where the participant of a community also share many interest. Thus, an embodiment herein provides an approach to find such communities. The approach is flexible and is at a meta level. The approach consists of finding tight communities, performing frequent itemset mining to find core members of the like-minded communities and finding the community structures amongst the core community members. Communities found based on the embodiment herein exhibit a stronger tendency to like/dislike items together compared to communities found using only the social interaction data.
Thus, a group of social network participants is defined as a like-minded community if they have the following properties:
1) The group is well connected in the social sense;
2) The group of people share many interests; and
3) The group is maximal, i.e., there is no larger group such that it contains this group and has properties 1 and 2.
Both the conditions 1 and 2 are flexible in that the well connectedness and many interests can be controlled by the user.
Modularity is a fair indicator of social well connectedness, which has also been generalized/modified. In particular, generalized modularity can account for overlapping communities and is equivalent to the conventional definition for non-overlapping communities.
where m is the number of undirected edges in the graph (i.e., |E|), Auv is the edge weight of the edge between u and v and du is the degree of node u. Lu is the set of communities which u is a member of δ(u, v) is 1 if |Lu u Lv|>0, and 0 otherwise.
To formalize the notion of like-mindedness, a new measure captures the level of compatible interests amongst the community members. Let C denote a set of communities on a social network graph G and bipartite rating graph B. Let c ∈ C be a community. Let u and v be two members of community c. The similarity between u and v as the cosine similarity of their ratings is defined, i.e.,:
where Ru is the rating vector of user u. The numerator is the dot product of the rating vector and the denominator is product of norms of these rating vectors. When the two users do not rate enough items together, the cosine similarity may give a wrong picture. For example, if two people only rate one movie in common and they rate it exactly the same way, assigning them a similarity value of 1 may give a misleading picture. Also, it is not very robust, since a change in only one rating can change the similarity value by a large amount. Cosine similarity is scaled down if there are not enough ratings in common. If the number of movies rated in common by u and v is w<50, then the cosine similarity is scaled by w/50. That is, the weighted cosine similarity between u and v is given as:
However, one can use a rating similarity definition that one prefers, e.g., L1 or L2 distance between the ratings if it is suitable for the problem domain.
The like-mindedness of a community structure C is defined as:
where δ(u, v) is 1 if u and v belong to the same community and 0 if they belong to different communities. This gives a measure of the like mindedness of the people. It is reminiscent of first term in modularity and uses the similarity of the two users in place of edge weight (which may be binary for unweighted graphs). However, unlike modularity, the notion of expected similarity is not required here. Of course, by taking the similarity between u and v as Cos(u, v) and Coswt(u, v), two version of similarity score for the community set C, which are denoted as Scos(C) and Swc(C), respectively.
The approach to find like-minded communities is now described. One highlight of the approach is that there is a lot of flexibility regarding the choices that need to be made. This approach is at a meta level, wherein suitable definitions, measures and algorithms can be chosen depending on the problem domain and the preferences of the user.
Recall that like-minded communities are socially tight knit groups which share many interests. Let G=(V,E) be the base graph that describes the social interaction. Let B=(V, T,R) be the bipartite graph describing the interests R of people V in items T. Let r denote the compatibility of interests of people in items. If u and v have compatible interest for item i, then this is denoted by (u r, v). |T| graphs are derived from this information for these two graphs in the following manner. For each item ti, a graph Gri as Gri=(Vri, Eri) is constructed such that e=(u, v) Er/i if e ∈ E and (u v). Further, Vri is the set of vertices v such that e ∈ Eri and e is incident on v. In other words, for each item, the set of edges is retained from the base graph such that the interests of the nodes on the two sides of the edge are compatible for the chosen compatibility relation.
Primitive communities are found in these graphs Gri corresponding to each item A set of vertices is divided into primitive communities which are socially tight knit groups typically with more intra community interactions and less inter community interactions. These primitive communities are used as the building block for the like-minded communities.
Now, each maximal primitive community is considered as a transaction with the people as items. The item for which the maximal primitive community is found is attached to the transaction as an additional attribute. Now frequent itemset mining is run on this dataset to find all the closed frequent itemsets. An itemset is called as closed frequent itemset if it has the minimum desired support (i.e., this set of items occurs together in sufficient number of transactions) and there is no itemset which contains this itemset and has equal support. While finding the closed frequent itemsets, distinct items tj are tracked in which this itemset appears together. After finding the closed frequent itemsets, these itemsets are filtered to keep only the itemsets that have number of distinct items tj of compatible interest (besides the required support in conventional sense).
Now, the union of all the closed frequent itemsets is taken, which in turn is a set of people and the induced subgraph of these people is taken. Communities on this graph are found using a community finding method. The communities thus found are the like-minded communities.
Social networks are defined by an induced sub-graph of entities interested in one item at a time to find communities in each graph. The communities are analyzed based on a frequent itemset analysis including two support parameters, the number of communities in which the entities participate together, and the number of distinct shared interests or products. In the frequent itemset algorithm, communities are treated as the transactions and the entities as the items in the transaction. From these support parameters, core communities having core members are found. The core community is then expanded to include as many entities as desired based on a maximum or threshold parameter. The advantage of the current embodiments is that it allows specific communities to be identified which can be monetized based on a sharing of interests. Received data may come in two parts: the social network amongst the participants and the interest of entities in particular items.
C1 is associated with C2, C3, C5 and C10;
C2 is associated with C1, C3, C5 and C7;
C3 is associated with C1, C2, C4, C5 and C9;
C4 is associated with C3 and C8;
C5 is associated with C1, C2, C3 and C10;
C6 is associated with C7 and C8;
C7 is associated with C2, C6 and C8;
C8 is associated with C3, C4, C6 and C7;
C9 is associated with C3; and
C10 is associated with C1 and C5.
For example,
Next, a minimum support level of “2” 702, is determined to identify sets of customers that appear at least twice in the series of identified primitive communities. This minimum support level of “2” yields three sets (704): [C1, C2, C3] from primitive communities of P1 and P2; [C6, C7, C8] from primitive communities of P2 and P3 and [C2,C3,C5] from primitive communities of P2, P3, P4.
Combining all of these identified customers 710 yield customers who have demonstrated similar purchasing behavior related to their socially connected neighbors. The result 712 is customers [C1, C2, C3, C5, C6, C7, C8]. These are the customers who are most likely to be affected by their friend's choice of products.
A minimum support level of “3” 706, identifies sets of customers that appear at least three times in the series of identified primitive communities. This minimum support level of “3” yields only one set (708): [C2, C3, C5] from primitive communities P2, P3 and P4.
Item 800 in
Item 900 in
With reference to
Any combination of at least one computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having at least one wire, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects may be written in any combination of at least one programming language, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects herein are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments herein. It will be understood that each block of the flowchart illustrations and/or D-2 block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
A representative hardware environment for practicing the embodiments herein is depicted in
The flowchart and block diagrams in