The field of the invention relates in general to extraction of information from public social networks. More particularly, the invention relates to a method and system for determining by a third party a human hierarchical structure of an organization, based on information which is publicly provided by a social network.
In recent years, online social networks have grown in scale and variability and today offer individuals the possibility of publicly presenting themselves, exchanging ideas with friends or colleagues, and networking in a scale and manner which was impossible a few years ago. For example, Facebook has more than billion registered users, with many new users signing up each month. According to recent statistics published by Facebook, 50% of Facebook users log onto this site on a daily basis, with an average total time of more than 7 hours per month and more than 30 billion pieces of content shared each month (web links, news stories, blog posts, notes, photo albums, etc.). On one hand, social networks created new opportunities to develop friendships, share ideas, and conduct business. However, on the other hand, many social network users expose via their profile pages personal and community details that relate, among others, to their social connections, and their place of employment. Sometimes, sensitive business data is also unintentionally exposed.
The art has shown that it is possible to extract a network (members, connections between people, etc.) from data available at a social networking service (e.g., Facebook, Twitter, Linkedin, etc.). This can be done, for example, by extracting the connections between various members, starting from a single member, and expanding the structure until determination of the entire network. The time when to stop the “extraction” of the network may be predefined by size, by characteristics of the network members, etc. Said network can be clustered to social communities.
There are various cases in which there is a need for a third party to determine the human structure of a commercial organization without receiving assistance from the organization itself, or from any of its employees. By “organization” it is meant herein to an hierarchical body which employs workers. By “structure of the organization” it is meant to the division of the organization into departments, and to the hierarchical structure of the organization in a whole, as well as in each of its departments, and the leading personnel in each department. There may be various reasons for such a need, such as commercial, financial, intelligence, human resourcing purposes, etc. In many cases, structural data of organizations is not publicly available. In other cases, a few pieces of data are available for an organization, not enabling the construction of the complete structure. The term “complete structure” refers herein to the whole structure of an organization, to the departmental division of the organization, or to a structure of one or more departments within the organization.
The data which the art is able to extract from publicly available social networks is, however, insufficient to determine the structure of a commercial organization. The extraction of a community structure by the prior art, however, fell short of determining of a departmental and human structure of organizations using data extracted from publicly available social network. Moreover, the art fell short of determining the hierarchy and leadership structure of organizations, using said data.
A user in Facebook is requested to provide some of his bibliographic data, such as his name, gender, place of living, educational data, hobbies, etc. In a particular relevancy to the present invention, the user also has the option of indicating his present working place, as well as previous ones. In another aspect, Facebook allows a user to search the database by keywords. For example, if a user types the keywords “Elite Inc.”, he receives access to the web page of this company in Facebook. However, in a vast majority of the cases, this will not lead to the structure of the company. In LinkedIn, typing the word “Elite Inc.” may provide a list of workers in this company, however, in general anything with respect to the structural data of this company is missing, unless specifically listed. Construction of a human structure of an organization (such as a company, corporation, etc.) may sometimes be possible based on data available from social networks. However, this structural construction can typically be performed only when the relevant data is directly available, and it may typically require a significant amount of manual lengthy work.
Various limitations are applied by social networks on searching their databases. For example, upon typing in LinkedIn the word “IBM”, only a limited list of the IBM workers is provided (for example 300 workers), which does not enable construction of the complete structure of this corporation. In another example, Facebook allows carrying out two operations with respect to each person in its database: (a) extraction of the profile page with personal details for that person; and (b) asking for all the friends of that person. However, Facebook throttles massive crawling attempts by limiting the number of operations performed by a single account or from a single IP address. As will be shown, the present invention can operate even under such limitations.
It is therefore an object of the present invention to provide a method and system for constructing a human structure of an organization based on data which is publicly available from a social network.
It is another object of the present invention to provide such a method which overcomes search limitations that are typically applied by social networks.
It is still another object of the present invention to provide a method which applies indirect tools, for overcoming lack of structural data with respect to departmental structure and leadership positions.
It is still another object of the present invention to provide such a method which can be almost entirely automated.
Other objects and advantages of the invention will become apparent as the description proceeds.
The invention relates to a method for determining by a third party a structure of a commercial organization based on data extracted from one or more of public social networks, which comprises the steps of: (a) determining the list of employees in the organization by: (a.1) defining a list of employees, and adding few names of known employees to said list; (a.2) defining a list of potential employees; (a.3) extracting from a public social network the list of friends of each of the employees already in said list of employees, and adding the names in all said friend's lists to said list of potential employees; (a.4) for each of the names in said list of potential employees, checking whether they are connected in the public social network with one or more of the names already in said list of employees, and sorting said list of potential employees such that those names having more of such connections appear at the top of the list; (a.5) for each of those names appearing at the top of the list of potential employees, checking at their bibliography whether they work in the organization, and if so, adding to said list of employees, or otherwise dropping from said list of potential employees; (a.6) extracting list of friends from one or more of said newly added names to the list of employees, and repeating the procedure from step a4 above; and (a.7) continuing with the procedure until some threshold is met, thereby completing said list of employees; (b) producing from said list of employees a network representation based on the connections between the various employees; (c) dividing said network representation to a departmental structure, using a community detection algorithm, and assigning a role to each of said departments by checking bibliographies of members in each department and finding a common denominator for the members in each department; and (d) determining leadership positions within the organization by use of centrality measures.
Preferably, said community detection algorithm is a Girvan-Newman fast greedy algorithm.
Preferably, said centrality measures are selected from eigenvector centrality, page rank, closeness, HITS, betweenness, or communicability centrality.
Preferably, said threshold is selected from: (a) a specific number of names that are sequentially checked in said list of potential employees, but none of them is found to work in the organization; (b) a specific number of employees that have been determined and included in said list of employees; (c) when the list of potential employees is empty.
In the drawings:
As noted above, the present invention relates to a method for determining the hierarchical structure of an organization, using data from a social network, for example, Facebook. The method is partially indirect, as it includes some determinations with respect to the departmental division of the organization as well as determination of leadership personnel that are not explicitly indicated anywhere in the social network. As will be shown, the method of the invention is mainly based on analyzing the connections between people, or more particularly the method is based on analysis of “friends” lists of persons within Facebook (or another social network).
The creation of the list of employees of the organization will now be described in more detail, with respect to
When list 1 has been formed, the network between the given workers in this list is also available or can easily be extracted (step 13 of
In the next step (14 in
After separating the social network of the organization into disjoint communities, step 14 continues by analyzing the role of each of the detected communities of the organization. This task can be performed, for example, by retrieving position descriptions and location of residence from social network (such as Facebook) profile pages of several community members, until the common denominator of all the community members is determined. For example, the procedure of step 14 may randomly pick up several dozens of users from a community. For these users, the procedure inspects users' positions within the organization by using publicly available professional networking resources, such as LinkedIn. In such a manner, each of said communities is assigned with a respective role.
Corporation 1 is an international IT Corporation which provides products and services to customers around the world. According to the company's web page, the company currently employs more than 50,000 employees. An organization crawler was used in step 12 of
After determination of the various communities in the organization, and the role of each community in the network the procedure continues to step 15 of
The procedure of step 15 analyzes the organizational network representation created in step 13. Let G=<V,E> be the network representation, where ∈ node v∈V is a Facebook user who is associated with the target organization and (u,v)∈E represents a Facebook friendship link between two users. It is possible to pinpoint leadership roles by analyzing solely the structure of G. First, for each user v∈V in G, the procedure calculates several centrality measures. Next, for each centrality measure, the procedure determines the top users (for example, 10 to 20) who received the maximal score. This role determination may be verified from each of said user's bibliography (profile) in Facebook. If the information in one public social network is not enough to reveal the users leadership positions within the organization the leadership positions may be verified from other online sources, such as LinkedIn and Google search engines. By using these methods, the inventors have found that they succeeded in most cases, to accurately reveal the users leadership positions.
Based on said centrality measures, and verification results machine learning algorithms may be used to build classifiers that can automatically identify management roles inside an organization based on the different centrality measures of the vertices in the network representation. By using these classifiers, it is possible to find a wider range of management roles relying on complex centrality measures criteria.
Furthermore, similar means may be used to reveal different statistics about the organization. For example, using said means, the inventors could estimate the percent of management positions and the number of employees inside several organization.
Table 1 illustrates the verification of the leadership identification procedure for the top 10 and top 20 employees as identified, using the various centrality measures. The results indicate that each of the calculated centrality measures can assist in identifying managers within an organization. The table shows this verification as done for two small organizations S1 and S2, two medium size organizations M1 and M2, and two large scale corporations L1 and L2. The various centrality measures that have been used are listed in the top row, and are as follows: closeness centrality (Closeness), Betweenness (BC), eigvector centrality (EC), HITS, PageRank, Communicability Centrality (CC), and Load Centrality (LC). Closeness demonstrated the highest average precision at 20 (0.78), while PageRank received the lowest score (0.7).
As illustrated above, the above results show that high centrality within a network representation of an organization is a good indication of a leadership role within the organization.
As demonstrated, the invention provides a method which enables a third party (i.e., a person which is external of the organization) to construct a structure of the organization in terms of names of employees, departmental structure, and leadership positions, using public social networks. The method of the invention overcomes typical limitations that are introduced by public social networks in terms of extraction of data from their databases, and shows that performance of this construction is feasible.
While some embodiments of the invention have been described by way of illustration, it will be apparent that the invention can be carried into practice with many modifications, variations and adaptations, and with the use of numerous equivalents or alternative solutions that are within the scope of persons skilled in the art, without departing from the spirit of the invention or exceeding the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
223544 | Dec 2012 | IL | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL2013/051011 | 12/9/2013 | WO | 00 |