Online social networks are communities on the Internet where people can come together to exchange information, ideas, and opinions. These online social networks (such as MSN Spaces) are rich with user-created text content, imported pictures, and music. In addition, several users of the online social network maintain a blog. In general, a blog is an online publication with regular posts, presented in reverse chronological order. The contents of a social network user's blog may concern any aspect of daily life, such as news, politics, business, science. In addition, these blogs frequently act as a personal diary to record the user's interests, opinions and events.
Most online social networks are quite large in scale. For example, one online social network has more than 58 million users. These users interconnect with each other, which builds up a very rich and useful social network for each user. A user's social network is his compilation of online friends. This personal social network may contain hundreds or even thousands of other users, along with complex and often unique links between the user and a friend. For example, a link between the user and an online friend may range from a casual acquaintance to close family member. The link does not even need to be user initiated. It may simply be another user in the community viewing the user's blog.
It is quite desirable to be able to analyze and mine information from a user's social networks within an online social community. For example, mining information about user-created content on blogs and each user's social network enables advertisers to better understand the different user groups within the community. The ultimate goal of the advertiser is more efficient ad targeting and product improvement. This mining provides an advertiser with rich and valuable intelligence to better understand social network users, optimize viral marketing, refine ad targeting, and expand behavioral segments. One problem, however, is that there is currently a dearth of application (or end-user software) that allows an application user to visualize a user's social network and to mine the social network for information about the users and their interconnected online social relationships.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The social network visualization and mining system includes a visualization application for mining social networks of users in an online social community. In general, the social network visualization and mining system display a graphic of a user's social network in a manner that is both efficient and useful. Moreover, this visualization can be used to mine the social network for additional information and intelligence. The mining of information includes the examination of user-created content and the relationships between users.
The social network visualization and mining system has several applications, including providing advertisers with knowledge and information about potential consumers to enable targeted advertising. By using the social network visualization and mining system, an advertiser can target its advertising of a product to obtain the highest return on its investment. The social network visualization and mining system may also be used to analyze and visualize other types of communities and networks.
The social network visualization and mining system includes a social network visualization module that displays the social network to an application user in graphical form. Smooth and effective user interfaces help the application user easily change focus between different users. In one embodiment, a two-dimensional (2-D) node-link graph is used to display the social network of a user. A center node is used to represent the primary social network user being examined, and secondary nodes represent the primary user's friends. Lines are used to represent the links between the primary user and these friends. Various visualization features such as line thickness, line color, and text size are used enable the application user to easily identify the type of link between the primary user and his friends. In another embodiment, the structure of a social network is displayed in a layered tree format.
The social network visualization and mining system also includes a topics visualization module. This module builds and displays a social network based on a certain topic or keyword that is entered by the application user. For example, an advertiser may want to know which users are interested in baby products. A topic or keyword search by the advertiser may include the term “diapers” in order to identify users who are parents. The social network of each user interested in this topic then may be visualized using the social network visualization module. This visualization is an excellent target community for viral marketing campaigns and ad targeting of relevant products or services.
The social network visualization and mining system also includes a demographic prediction module. Many users in an online social community give no or false demographic information. However, it can be important to advertisers to know the age, location, and gender of users. The demographic prediction module examines a user's social network to predict the demographics of the user. This allows an advertiser to use the social network visualization and mining system to target advertising by demographics to connect to the right audience.
It should be noted that alternative embodiments are possible, and that steps and elements discussed herein may be changed, added, or eliminated, depending on the particular embodiment. These alternative embodiments include alternative steps and alternative elements that may be used, and structural changes that may be made, without departing from the scope of the invention.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
In the following description of the social network visualization and mining system, reference is made to the accompanying drawings, which form a part thereof, and in which is shown by way of illustration a specific example whereby the social network visualization and mining system may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the claimed subject matter.
Referring to
As shown in
The social network visualization and mining system 100 includes several interconnected modules. These modules include a social network visualization module 140, a topics visualization module 150, and a demographic prediction module 160. The social network visualization module 140 provides the application user (of the social network visualization module 140 application) with a graphical representation of a user's social network. As explained in detail below, in one embodiment this graphical representation is a node-link graph. In another embodiment, the representation is in a layered tree format.
The topics visualization module 150 provide the application user with the ability to search for user social network via topic or keyword. As explained below, this gives the application user the ability to find users with the same interests. The demographic prediction module 160 makes predictions about a user's demographics (such as age, location, and gender). These predictions are based on the social network of a user and the demographic information of the user's friends. Each of these modules outputs their results and graphical displays to a user interface 170 for display of results to the application user.
Next, a graphical representation is used to visualize the social network of a user (box 210). The graphical representation is based on the content and link data. In one embodiment, the graphical representation is a node-link graph. Moreover, in some embodiments, the node-link graph is a hypergraph, which is an open source project. This type of graph allows an application user to easily explore a user's social network, and quickly see links between the user and his friends. In addition, the user can be changed in order to visualize another user's social network. In another embodiment, the node-link graph may be transformed into a layered tree format.
The graphical representation can be refined using a demographic prediction technique (box 220). If the user be examined did not give any demographic data, or the data is suspect, then the social network visualization and mining method predicts the user's age, location, and gender based on the demographic data of the user's friends and the user's social network. Additional refinement of the graphical representation is possible using topic discovery (box 230). Topic discovery allows displays social network based on a desired topic or keyword, such that displayed users are interested in the topic.
The social network visualization and mining system includes a social network visualization module. The social network visualization module represents each social network as a node-link graph. Each node of the node-link graph represents a user and each link represents a relationship between users. The relationship can be any type of social network interaction, such as an e-mail, blog, or instant messenger interaction. The social network visualization module allows the visualization of the way in which users are linked in a social network that set that is quite difficult to see in its raw data form.
In one embodiment, the social network visualization module presents the structure of a social network in two-dimensional (2-D) space as a 2-D node-link graph. This 2-D node-link graph includes several features, including the ability to: (1) present the graph with various styles of nodes and edges (or lines); (2) handle a large-scale social network; and (3) present the social network structure in multiple forms.
In one embodiment of the social network visualization module, the nodes represent users in the social network. In this embodiment, the nodes are associated with a user identification (user ID). The position of a node in the 2-D node-link graph determines the structure and the shape of the graph. Each node is shown as a point (or dot) on the graph, while the user associated with a particular node is labeled as text (typically the user ID) near the node. Various colors and fonts are available for this text. In addition, in some embodiments the size of the text is used to indicate the distance between a center node and outlying nodes. The center node, which is capable of being changed by an application user, identifies the node currently being examined while the outlying nodes are those nodes in the social network of the user represented by the center node.
Lines are another element of the 2-D node-link graph, and are used to represent types of links between users. In other words, the type of social relationship between two users is indicated by the type of line used to join the two nodes representing the users. In one embodiment of the social visualization module, the lines are solid. In other embodiments the width of a line can be used to indicate the importance of the social relationship between users. By way of example, in some embodiments a thicker line represents a stronger relationship between users, while a thinner line represents a weaker relationship (as compared to the thicker line).
In some embodiments, line color can also be used to represent various types of relationships between users. In one embodiment, an orange line indicates a “user-defined friend”, a green line indicates a “page view” (or someone who has visited the users blog or web page), a light blue line indicates a “blog comment” (or someone who has comment on the user's blog), a purple line indicates a “blog trackback”, a yellow line indicates an “IM chat”, and a dark blue line indicates a “mixture”, meaning that there are no less than two kinds of the above types of relationships between users.
In another embodiment, special layouts, such as shadows, can be used to indicate different node clusters. In another embodiment, icons can be used to indicate how many neighbors there are for the user node. In one embodiment, an icon having one star indicates that the user node has only a few neighbors, an icon having three stars indicates that the user node has a moderate amount of neighbors (as compared to the icon having one star), and an icon having six stars indicates that the user node has many neighbors (as compared to the icon having one star and the icon having three stars.
The social network visualization module is capable of displaying a social network having up to 1,000 nodes. In addition, the complicated lines among nodes can also be illustrated for those up to 1,000 nodes on a node-link graph. The social network visualization module optimizes the node positions so that the line structure is visualized in a clear and elegant way.
The social network visualization module is capable of displaying a social network in a variety of display formats. In one embodiment, the social network is displayed in raw format. In another embodiment, the social network is displayed in a tree format. This tree format presents a social network user's social network connections in a hierarchical structure that conveys a clear and organized view of how other social network users are connected to this specific social network user.
In order to reorganize connections of social network user from the raw format to a tree format, the social network visualization module includes a tree building technique. In one embodiment, this tree building technique uses a layered approach. For example, suppose that all direct connections of a user node U corresponding to a social network user are selected and laid out on a first layer of a node-link graph. Next, a different user node A on the first layer is randomly selected. All direct connections of the user node A that are not yet displayed on the graph then are will be laid out as the first layer of user node A (and the second layer of user node U). This process is repeated for a remainder of user nodes on the first layer. The same process is used to extend to the third layer and any additional layers. The tree building technique is completed when all nodes are put in the tree.
The social network visualization and mining system includes a topics visualization module. In one embodiment, the topics visualization module allows the identification of groups of users having common interests and the connections among them. By way of example, social networks can be built for social network users who are blogging about the topic “xbox”. In one embodiment, the topics visualization module displays the largest isolated social network identified. In such an embodiment, the node in the middle of the node-link graph is the user having the most outgoing links.
In another embodiment, the topics visualization module can identify a social network user's complete extended social network and visualize it up to certain network layers.
The social network visualization and mining system includes a demographic prediction module that predicts the demographics of a social network user, even if the user has not provided or has provided erroneous demographic information. This demographic information includes age, location, and gender. Not all social network users provide their demographic information, and for those that do, some users may provide information that is not true. The demographic prediction module predicts these users' demographic features using their social network structures and blog contents.
Accurately predicting demographic information for a user can be quite beneficial for the application user who is an advertiser. By targeting to the right demographic group, an advertiser is more likely to find social network users that are interested in their products and willing to click on their advertisements. Moreover, users are more likely to accept advertisements delivered through their blogs that match their interests. By way of example, an 18 year-old male blogger typically will be much happier to see an xBox advertisement on his blog page rather than an advertisement for dentures.
In addition, knowing the age and gender of social network users allows an advertiser to message appropriately to different demographic groups. For example, in some case, women tend to use different terminology as compared to men, and respond better to advertisements having a more female oriented message. Location targeting can help businesses that rely on local traffic to reach locally relevant social network users.
In one embodiment, the demographic prediction module is used to evaluate the demographic distributions of users who are interested in a certain topic or keyword. The results can serve as a powerful demographic targeting suggestion tool for advertisers to optimize their advertisement campaigns. By way of example, an advertiser may be interested in bidding for the keyword “women shoes”. If the demographic prediction module shows a 4/1 ratio for female versus male users interested in the topic within the social network, then the advertiser can choose to target female users only. Demographic distributions calculated using other data sources, such as search terms, can be used for the same purpose as well.
The demographic prediction module predicts the age of a social network user by assuming that the user is approximately the same age as his or her friends within the social network. Typically, a greater percentage of social network users are younger adults. These younger users are more likely to have friends in the same age group as compared to older users.
In order to predict a user's age, the demographic prediction module determines all friends of a user based on that user's social network structure. In one embodiment, if a user has at least three direct neighbors with known ages, then they demographic prediction module predicts the user's age to be the median of all the neighbors' ages. In this embodiment, the median is selected as the prediction because not only is it simple to understand and easy to calculate, but also because it gives a measure that is more robust in the presence of outlier values than the mean. By way of example, consider a 21 year-old female having seven friends ages 19, 20, 21, 21, 21, 22, 22, 23 and 55. Assume that the first eight are high school and college friends, while the last friend is her uncle. In this case, the prediction of the demographic prediction module is age 21, while taking the mean yields a predicted age of 25.
Similar to age prediction, the demographic prediction module predicts a user's I location by assuming that a user's friends generally reside in the same local area as the user. Thus, in one embodiment, a user's location is predicted by voting between the locations recorded in his/her neighbors' profiles. The location is predicted as the major location of a user's neighbors.
The demographic prediction module uses a social network blog categorization technique to predict a user's gender. This categorization technique allows each blog to be categorized into one or more predefined categories. In addition, in one embodiment, there are assigned probabilities of “male” and “female” for each category. In this embodiment, the demographic prediction module sums the probabilities of each category for male and female and obtains a probability of a user's gender. In other embodiments, instead of using categories other identifiers can be used, such as keywords extracted from blogs, the most frequent terms used by the user, and the user's age and their neighbors' gender information.
The social network visualization and mining system is designed to operate in a computing environment. The following discussion is intended to provide a brief, general description of a suitable computing environment in which the social network visualization and mining system may be implemented.
The social network visualization and mining system is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the social network visualization and mining system include, but are not limited to, personal computers, server computers, hand-held, laptop or mobile computer or communications devices such as cell phones and PDA's, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The social network visualization and mining system may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The social network visualization and mining system may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. With reference to
Components of the computer 810 may include, but are not limited to, a processing unit 820 (such as a central processing unit, CPU), a system memory 830, and a system bus 821 that couples various system components including the system memory to the processing unit 820. The system bus 821 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
The computer 810 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by the computer 810 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer 810. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
Note that the term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 830 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 831 and random access memory (RAM) 832. A basic input/output system 833 (BIOS), containing the basic routines that help to transfer information between elements within the computer 810, such as during start-up, is typically stored in ROM 831. RAM 832 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 820. By way of example, and not limitation,
The computer 810 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 841 is typically connected to the system bus 821 through a non-removable memory interface such as interface 840, and magnetic disk drive 851 and optical disk drive 855 are typically connected to the system bus 821 by a removable memory interface, such as interface 850.
The drives and their associated computer storage media discussed above and illustrated in
Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, radio receiver, or a television or broadcast video receiver, or the like. These and other input devices are often connected to the processing unit 820 through a user input interface 860 that is coupled to the system bus 821, but may be connected by other interface and bus structures, such as, for example, a parallel port, game port or a universal serial bus (USB). A monitor 891 or other type of display device is also connected to the system bus 821 via an interface, such as a video interface 890. In addition to the monitor, computers may also include other peripheral output devices such as speakers 897 and printer 896, which may be connected through an output peripheral interface 895.
The computer 810 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 880. The remote computer 880 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 810, although only a memory storage device 881 has been illustrated in
When used in a LAN networking environment, the computer 810 is connected to the LAN 871 through a network interface or adapter 870. When used in a WAN networking environment, the computer 810 typically includes a modem 872 or other means for establishing communications over the WAN 873, such as the Internet. The modem 872, which may be internal or external, may be connected to the system bus 821 via the user input interface 860, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 810, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
The foregoing Detailed Description has been presented for the purposes of illustration and description. Many modifications and variations are possible in light of the above teaching. It is not intended to be exhaustive or to limit the subject matter described herein to the precise form disclosed. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims appended hereto.