In computing, a social network is a computer service that provides an online platform configured to allow people to interact with other people or with content hosted on the social network via a computer network such as the Internet. For example, a user can utilize a social network to broadcast messages via a social network account. Such posts can contain text, photos, videos, audios, or other suitable types of electronic content. In response, other users of the social network can repost, reply, comment, like, or perform other actions on the original messages. Such interactions can allow the users to share similar interests, activities, ideologies, or real-life connections.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Relational closeness of users in a social network is useful for the social network to provide relevant suggestions of content, potential connections, or other information to the users in the social network. For example, the social network can recommend a website or webpage to a user when other users connected to the user have visited the website or webpage. In another example, the social network can also recommend additional users as potential connections to a user based on connections of other users. In a further example, the social network can also recommend a team, group, department, or other types of organization to a user when connected users are members of such an organization. Such recommendations can allow the user to access relevant information, establish new connections, and otherwise enrich user experience of the social network.
Various metrics have been developed to gauge relational closeness of users in a social network. For example, one closeness metric is based on numbers of connections or “hops” connecting two users in a social network. For example, when a first user is directly connected to a second user in a social network, the closeness metric can be set to one because only one hop is needed for the first user to reach the second user in the social network. In another example, when a first user is connected to a second user via one or more intermediate users, the closeness metric can be set to two or more because two or more hops are needed for the first user to reach the second user in the social network. In other examples, closeness metrics can also be based on types of relationships (e.g., managers and subordinates), length of connections (e.g., years being connected), and other suitable parameters.
The existing closeness metrics, however, do not typically consider degrees of interactions between pairs of users in the social network when gauging relational closeness of the users. For example, a first user may be remote to a second user in a social network because many hops are needed for the first user to reach the second user in the social network. Alternatively, the first user may not even be connected to the second user at all. However, the first user has frequently exchanged emails, text messages, or other suitable types of interactions with the second user. Such frequent interactions would indicate closeness of the first user to the second user. However, the existing closeness metrics may deem the first and second users as not closely related despite such frequent interactions due to the remoteness or the lack of connections between the first and second users.
Several embodiments of the disclosed technology can address certain aspects of the foregoing drawbacks by implementing a data processor configured to implement social distance quantification based on user interactions via graph embedding. In certain implementations, the data processor can include a telemetry monitor, a graph inducer, a graph analyzer, and a social distance quantifier operatively coupled to one another. In other implementations, at least one of the foregoing components can be separate from the data processor. In further implementations, the data processor can also include other suitable components in additional to or in lieu of the foregoing components of the data processor.
The telemetry monitor can be configured to detect interactions of users in a social network. Such interactions can be with other users of the social network, with content (e.g., documents) hosted in the social network, or with teams, groups, or other suitable types of organizations in the social network. For example, the telemetry monitor can be configured to detect that a first user has exchanged several emails with a second user in addition to exchanging instant messages with other users in the social network. Such monitoring can be performed with user consent to protect user privacy and may be opted out. Upon detecting such interactions, the telemetry monitor can be configured to generate database records corresponding to the detected interactions. For example, a database record can include suitable data fields corresponding to a type (e.g., email), date/time, recipient, or other suitable parameters of an interaction. The telemetry monitor can also be configured to compile the database records as a dataset of interactions. In other implementations, the telemetry monitor can be separate from the data processor and instead can be configured to allow the data processor to access the compiled dataset of interactions via, for instance, an Application Programming Interface (API).
The graph inducer of the data processor can be configured to induce the compiled dataset of interactions into an interaction graph. In certain embodiments, the graph inducer can represent a user (or a corresponding email or other types of user account) as a vertex in a graph and each detected interaction as an edge between two or more vertices. For example, a first vertex can represent a first user while a second vertex can represent a second user. A directional edge can connect the first and second vertices to represent an email the first user sent to the second user. Another directional edge can connect the first vertex to a third vertex corresponding to a third user when the first user sent the same or different email to the third user. Thus, an edge pointing from the first vertex to the second or third vertex can represent a detected email transmitted from the first user to the second or third user. Another edge pointing from the second or third vertex to the first vertex can represent an email reply from the second or third user to the first user.
In certain embodiments, the edge can be weighted, for instance, based on how many recipients an email is addressed to. For example, an email addressed to only one user can be assigned to carry a higher weight than an email addressed to many users. Thus, in one example, the graph inducer can be configured to assign a weight to an edge that is an inverse of the number of recipients the email is addressed to. In other examples, the graph inducer can be configured to assign a fixed weight (e.g., one) to all emails while filtering emails with more than a threshold (e.g., four) of recipients. In additional embodiments, the edge can be weighted based on whether a reply to the email is received, an elapsed time between a reply to the email and transmission of the email, or other suitable parameters of the email interactions and/or in other suitable manners.
Upon inducing the interaction graph from the dataset of interactions, the graph analyzer can be configured to apply graph embedding to the induced interaction graph to generate a vertex level tensor-based embedding for each user represented by the vertices in the interaction graph. In computing, graph embedding generally refers to techniques used to transform vertices, edges, and associated features (e.g., represented by weights of edges) into tensors in a vector space of certain dimensions (e.g., 256 dimensions) while maximally preserving graph structure and information.
According to one graph embedding technique, a vertex in a graph can be represented as a combination of non-linear transformations of an aggregation of features from connected neighbors and ultimately the entire latent space of the graph of the vertex and features of the vertex itself. For example, when user A is connected to users B, C, and D in the interaction graph, features of vertex A representing user A can be computed as a non-linear transformation of an aggregate of features from vertices corresponding to users B, C, and D combined with a non-linear transformation of features of vertex A by applying encoding functions. When users B, C, and D are connected to additional users in the interaction graph, each user B, C, and D can be represented similarly as aggregations of features from respective neighbors by applying additional encoding functions. As such, each vertex in the interaction graph can have a corresponding computational graph that captures neighborhood structure of the interaction graph around the vertex as well as features of the vertex and corresponding neighbors.
The encoding functions for both the aggregate features of connected neighbors and the vertex itself can be developed via machine learning by, for example, using a “neural network” or “artificial neural network” configured to “learn” or progressively improve performance of tasks by studying known examples. In certain implementations, a neural network can include multiple layers of objects generally refers to as “neurons” or “artificial neurons.” Each neuron can be configured to perform a function, such as a non-linear activation function, based on one or more inputs via corresponding connections. Artificial neurons and connections typically have a contribution value that adjusts as learning proceeds. The contribution value increases or decreases a strength of an input at a connection. Typically, artificial neurons are organized in layers. Different layers may perform different kinds of transformations on respective inputs. Signals typically travel from an input layer to an output layer, possibly after traversing one or more intermediate layers.
Training the neural network to develop the encoding functions can be performed in a supervised, semi-supervised, or unsupervised fashion. For example, the neural network can be trained using a loss function based on random walks (e.g., node2vec, DeepWalk, struc2walk, etc.), graph factorization, or node proximity in a graph. During graph embedding, the graph analyzer can also be configured to tune how the neural network performs graph embedding. For example, when performing random walks within node2vec, the graph analyzer can apply differing numbers of random walks based on a degree centrality of a vertex in the graph. As such, oversampling of vertices with very few connections could be prevented. In another example, when performing window calculations (e.g., how many vertices are considered in a window), the graph analyzer can be configured to use a very narrow window size (e.g., two). Thus, by using such techniques, the graph analyzer can develop encoding functions for both the aggregate of connected neighbors as well as for each vertex itself. The encoding functions (or machine learning models) can then be used to convert each vertex in the graph into a tensor in a vector space individually representing a position and/or a level of interaction between users in the social network.
Upon obtaining the tensors corresponding to the users represented by the vertices in the graph, the social distance quantifier can be configured to quantify a social distance based on interactions between a pair of users using corresponding tensors. For example, the social distance can be computed using tensor distance metrics such as dot product distance, cosine similarity, or Euclidean distance. As such, for each user, the social distance quantifier can produce a set of tensor distances. Based on the tensor distances, the social distance quantifier can be configured to rank other users in the social network for closeness to a user based on interactions of among the users. Though the technique is described above in the context of user interactions with other users, similar techniques can also be applied in the context of user interactions with content or interactions with teams, groups, or other suitable types of organizations on the social network or other suitable types of computing network.
Several embodiments of the disclosed technology can be applied to resolve the technical issue of quantifying degrees of interactions in a social network using machine learning. By constructing the interaction graph, interaction features or properties, such as exchange of emails, the number of emails exchanged, recency of exchanged emails, etc., can be graphically represented. The graphically represented interaction features can then be converted into vertex level tensors in a vector space via graph embedding. Using the vertex level tensors, the data processor can readily determine social distances based on interactions between the users as tensor distances between pairs of the tensors. As such, social distance values corresponding to degrees of interactions of the users can be readily quantified and visualized.
The quantified social distance values can be useful in providing suggestions of potential connections, content, or organizations based on user interactions in a social network. For example, when a new user joins a team to replace a previous user, the data processor can be configured to suggest to the new user potential connections of other users in the team according to a ranking of social distance of the other users to the previous user. In another example, when a new user joins a team but not to replace any other users in the team, an average, medium, or other suitable values of tensors from other users can be used to calculate estimated social distances from users in the team. As such, the new user is likely to quickly establish valuable relationship with other users in the team with whom the new user is likely to interact. In further examples, the data processor can also be configured to similarly suggest content items or groups to the new user such that the new user can be exposed to likely useful information or activities.
Certain embodiments of systems, devices, components, modules, routines, data structures, and processes for interaction based social distance quantification are described below. In the following description, specific details of components are included to provide a thorough understanding of certain embodiments of the disclosed technology. A person skilled in the relevant art will also understand that the technology can have additional embodiments. The technology can also be practiced without several of the details of the embodiments described below with reference to
Many terminologies are used herein to illustrate various aspects of the disclosed technology. Such terminologies are intended as examples and not definitions. For instance, a computing platform can be a computing facility having a computer network interconnecting a plurality of servers or hosts to one another or to external networks (e.g., the Internet). An example of such a computing facility can include a datacenter for providing cloud computing services. A compute network can include a plurality of network devices. A network device can be a physical network device, examples of which include routers, switches, hubs, bridges, load balancers, security gateways, or firewalls. A host or host device can include a computing device that is configured to implement, for instance, one or more virtual machines, containers, or other suitable virtualized components. For example, a host can include a remote server having a hypervisor configured to support one or more virtual machines, containers, or other suitable types of virtual components. In another instance, a host can also include a desktop computer, a laptop computer, a smartphone, a web-enabled appliance (e.g., a camera), or other suitable computing devices configured to implement one or more containers or other suitable types of virtual components.
In another example, a computing service or cloud service can include one or more computing resources provided over a computer network such as the Internet. Example cloud services include software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). SaaS is a software distribution technique in which software applications are hosted by a cloud service provider in, for instance, datacenters, and accessed by users over a computer network. PaaS generally includes delivery of operating systems and associated services over the computer network without requiring downloads or installation. IaaS generally includes outsourcing equipment used to support storage, hardware, servers, network devices, or other components, all of which are made accessible over a computer network.
In addition, a social network can include a computer network and/or associated computer service that provides an online platform configured to allow users to interact with other users or with content hosted on the social network via a computer network such as the Internet. Interactions on such a social network can include exchanging emails, instance messages, text messages, VoIP calls, etc. between users, as well as access, edit, create, or otherwise manipulating documents, videos, voice recordings, or other suitable types of content items. Various interactions of the users in the social network can be represented by a graph structure or other suitable types of data structures. In a graph structure, multiple vertices can correspond to corresponding users, content items, or groups in a social network. The graph can also include edges that connect pairs of the vertices. The edges can include both a direction and a weight. As discussed in more detail later, the direction can correspond to a direction of interaction between users while the weight can be assigned according to various criteria.
The client devices 102 can individually include a computing device that facilitates access to various resources, such as emails, social network services, file management services via the computer network 104 by the users 101 (identified as first, second, and third users 101a-101c). For example, in the illustrative embodiment, the first computing device 102a includes a laptop computer. The second computing device 102b includes a desktop computer. The third computing device 102c includes a tablet computer. In other embodiments, the client devices 102 can also include smartphones or other suitable computing devices. Even though three users 101 are shown in
The computing platform 108 can be configured to facilitate interactions among the users 101 as well as between the users 101 and content items hosted in the computing platform 108. For example, as shown in
The file management servers 103 can be configured to implement certain policies to facilitate access of the documents 105 by the users 101 via the computer network 104. For example, in one embodiment, the file management servers 103 can implement access control policies such that certain class, type, category, or other suitable grouping of the documents 105 can be accessible to certain users 101. In another embodiment, the file management servers 103 can also implement file retention policies such that certain class, type, category, or other suitable grouping of the documents 105 can be automatically deleted or purged from the network storage 111. In further embodiments, the file management servers 103 can implement other suitable types of policies to regulate storing, editing, accessing, purging, or other suitable operations on the documents 105.
The email servers 106 can be configured to running suitable applications that are configured to facilitate email interactions among the users 101. For example, the email servers 106 can be configured to receive incoming emails 113 from senders and forward outgoing emails 113 to recipients via the computer network 104. In certain implementations, the email servers 106 can be configured to maintain and/or access one or more inboxes for corresponding users 101 at the network repository 107. Periodically or upon demand, the email servers 106 can be configured to receive and forward emails 113 from the inboxes to the client devices 102 of the users 101.
The data processor 110 can be configured to monitor interactions of the users 101 on the computing platform 108 and perform interaction based social distance quantification for the users 101. For example, as shown in
In certain embodiments, the data processor 110 can be configured to query the file management servers 103 and the email servers 106 for the interaction data 109. In other embodiments, the file management servers 103 and the email servers 106 can individually include a reporting agent (not shown) that collects and transmit to the data processor 110 the interaction data 109. In further embodiments, other suitable arrangements may be used to collect the interaction data 109 from the file management servers 103 and the email servers 106. In further embodiments, the data processor 110 can also be configured to collect and monitor interaction data 109 related to interactions via instant messaging, online meetings, VoIP calls, or via other communication channels. With the received interaction data 109, the data processor 110 can be configured to implement social distance quantification based on user interactions via graph embedding such that social distance values for each pair of the users 101 can be derived, as described in more detail below with reference to
As shown in
Upon detecting such interactions, the telemetry monitor 112 can be configured to generate database records 121 corresponding to the detected interactions. As shown in
The graph inducer 114 can be configured to induce the compiled dataset 120 of interactions from the telemetry monitor 112 into an interaction graph 130. In certain embodiments, as shown in
In certain embodiments, the edges 132 can be weighted, for instance, based on how many recipients an email 113 is addressed to. For example, an email 113 addressed to only one user 101 can be assigned to carry a higher weight than an email 113 addressed to many users 101. Thus, in one example, the graph inducer 114 can be configured to assign a weight to an edge 134 that is an inverse of the number of recipients the email 113 is addressed to. In other examples, the graph inducer can be configured to assign a fixed weight (e.g., one) to all emails 113 while filtering emails 113 with more than a threshold (e.g., four) of recipients. Thus, as shown in
Upon inducing the interaction graph 130 from the dataset 120 of interactions, the graph analyzer 116 can be configured to apply graph embedding to the induced interaction graph 130 to generate a vertex level tensor-based embedding for each user 101 represented by the vertices 132 in the interaction graph 130. In computing, graph embedding generally refers to techniques used to transform vertices, edges, and associated features (e.g., represented by weights of edges) into tensors in a vector space of certain dimensions (e.g., 256 dimensions) while maximally preserving graph structure and information.
According to one graph embedding technique, a vertex in a graph can be represented as a combination of non-linear transformations of an aggregation of features (e.g., the number of emails 113 exchanged) from connected neighbors of the vertex 132 and features of the vertex 132 itself. For example, when user A is connected to users B, C, and D in the interaction graph 130, features of vertex A representing user A can be computed as a non-linear transformation of an aggregate of features from vertices 132 corresponding to users B, C, and D combined with a non-linear transformation of features of vertex A by applying encoding functions. When users B, C, and D are connected to additional users in the interaction graph 130, each user B, C, and D can be represented similarly by applying additional encoding functions. As such, each vertex 132 in the interaction graph 130 can have a corresponding computational graph that captures neighborhood structure of the interaction graph 130 around the vertex as well as features of the vertex 132 and corresponding neighbors.
The encoding functions for both the aggregate features of connected neighbors and the vertex 132 itself can be developed via machine learning by, for example, using a “neural network” or “artificial neural network” configured to “learn” or progressively improve performance of tasks by studying known examples. In certain implementations, a neural network can include multiple layers of objects generally refers to as “neurons” or “artificial neurons.” Each neuron can be configured to perform a function, such as a non-linear activation function, based on one or more inputs via corresponding connections. Artificial neurons and connections typically have a contribution value that adjusts as learning proceeds. The contribution value increases or decreases a strength of an input at a connection. Typically, artificial neurons are organized in layers. Different layers may perform different kinds of transformations on respective inputs. Signals typically travel from an input layer to an output layer, possibly after traversing one or more intermediate layers.
Training the neural network to develop the encoding functions can be performed in a supervised, semi-supervised, or unsupervised fashion. For example, the neural network can be trained using a loss function based on random walks (e.g., node2vec, DeepWalk, struc2walk, etc.), graph factorization, or node proximity in a graph. During graph embedding, the graph analyzer 116 can also be configured to tune how the neural network performs graph embedding. For example, when performing random walks within node2vec, the graph analyzer 116 can apply differing numbers of random walks based on a degree centrality of a vertex in the graph 130. As such, oversampling of vertices 132 with very few connections could be prevented. In another example, when performing window calculations (e.g., how many vertices are considered in a window), the graph analyzer 116 can be configured to use a very narrow window size (e.g., two). Thus, by using such techniques, the graph analyzer 116 can develop encoding functions for both the aggregate of connected neighbors as well as for each vertex itself.
The encoding functions (or machine learning models) can then be used to convert each vertex 132 in the graph 130 into a graph embedding 140 having tensors 142 in a vector space individually representing a position and/or a level of interaction between users in the social network. For example, as shown in
Upon obtaining the tensors 142 corresponding to the users 101 represented by the vertices 132 in the graph 130, the social distance quantifier 118 can be configured to quantify a social distance based on interactions between a pair of users 101 using corresponding tensors 142. For example, as shown in
Based on the tensor distances 146, the social distance quantifier 118 can be configured to also rank users 101 for closeness to a user 101 based on interactions of among the users 101. For example, as shown in
Several embodiments of the disclosed technology can be applied to resolve the technical issue of quantifying degrees of interactions in a computing platform such as a social network using machine learning. By constructing the interaction graph 130, interaction features or properties, such as exchange of emails, the number of emails exchanged, recency of exchanged emails, etc., can be graphically represented. The graphically represented interaction features can then be converted into vertex level tensors 146 in a vector space via graph embedding. Using the vertex level tensors 146, the data processor 110 can readily determine social distance values based on interactions between the users 101 as tensor distances 146 between pairs of the tensors. As such, social distance values corresponding to degrees of interactions of the users 101 can be readily quantified and visualized.
The quantified social distance values can be useful in providing suggestions of potential connections, content, or organizations based on user interactions in a social network. For example, when a new user (e.g., User T) joins a social network to replace a previous user (e.g., User A), the data processor 110 can be configured to suggest to the new user potential connections of other users 101 in the social network according to a ranking of social distance of the other users 101 to the previous user (e.g., User A). In another example, when a new user joins a social network but not to replace any other users 101, an average, medium, or other suitable values of tensors from other users 101 can be used to calculate estimated social distances from users 101. As such, the new user is likely to quickly establish valuable relationship with other users 101 with whom the new user is likely to interact. In further examples, the data processor 110 can also be configured to similarly suggest content items or groups to the new user such that the new user can be exposed to likely useful information or activities.
As shown in
Depending on the desired configuration, the system memory 306 can be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof. The system memory 306 can include an operating system 320, one or more applications 322, and program data 324. This described basic configuration 302 is illustrated in
The computing device 300 can have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 302 and any other devices and interfaces. For example, a bus/interface controller 330 can be used to facilitate communications between the basic configuration 302 and one or more data storage devices 332 via a storage interface bus 334. The data storage devices 332 can be removable storage devices 336, non-removable storage devices 338, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media can include volatile and nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. The term “computer readable storage media” or “computer readable storage device” excludes propagated signals and communication media.
The system memory 306, removable storage devices 336, and non-removable storage devices 338 are examples of computer readable storage media. Computer readable storage media include, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other media which can be used to store the desired information, and which can be accessed by computing device 300. Any such computer readable storage media can be a part of computing device 300. The term “computer readable storage medium” excludes propagated signals and communication media.
The computing device 300 can also include an interface bus 340 for facilitating communication from various interface devices (e.g., output devices 342, peripheral interfaces 344, and communication devices 346) to the basic configuration 302 via bus/interface controller 330. Example output devices 342 include a graphics processing unit 348 and an audio processing unit 350, which can be configured to communicate to various external devices such as a display or speakers via one or more NV ports 352. Example peripheral interfaces 344 include a serial interface controller 354 or a parallel interface controller 356, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 358. An example communication device 346 includes a network controller 360, which can be arranged to facilitate communications with one or more other computing devices 362 over a network communication link via one or more communication ports 364.
The network communication link can be one example of a communication media. Communication media can typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. A “modulated data signal” can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media. The term computer readable media can include both storage media and communication media.
The computing device 300 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. The computing device 300 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
From the foregoing, it will be appreciated that specific embodiments of the disclosure have been described herein for purposes of illustration, but that various modifications may be made without deviating from the disclosure. In addition, many of the elements of one embodiment may be combined with other embodiments in addition to or in lieu of the elements of the other embodiments. Accordingly, the technology is not limited except as by the appended claims.