The following generally relates to a prediction system for geographical locations of users based on social and spatial proximity, and related methods.
Location is one of the most important data tags used to direct computations, recommendations, information and services to specific user accounts or user devices. For example, geo-targeting in digital advertising allows for significant personalization and accurate measurement. In addition, with the huge increase in the number of wearable computing devices, geo-targeting has never been more powerful.
In traditional media, most geo-targeting is implicit. For example, if a person places an advertisement in a physical newspaper called the Toronto Star, only people in Toronto will see the advertisement. However, in digital media that assumption no longer holds true. Anyone with access to Internet can login to his/her social media account, thus making geo-location dynamic (as opposed to the traditional notion of static). There is also a one-to-many mapping from a person to geo-locations. In other words, people may be associated with multiple locations.
Embodiments will now be described by way of example only with reference to the appended drawings wherein:
It will be appreciated that for simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the example embodiments described herein. However, it will be understood by those of ordinary skill in the art that the example embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the example embodiments described herein. Also, the description is not to be considered as limiting the scope of the example embodiments described herein.
Geo-location (also called geographic location) for social media users has to be typically inferred as only a very small percentage of users disclose their location. For example, it is herein recognized that on the social data network called Twitter only 1.8% of users have specified their location out of which many are spurious.
Typically, geographically locating users revolves mainly around mapping users' Internet Protocol (IP) addresses to known or predicted locations. While this approach seems to work relatively well in e-commerce or social media environments, or for Internet service providers, companies that have secondary access to social data (e.g. lease the social data) however have either limited or no access at all to users IP addresses and other useful sign-ins information due to privacy reasons. This poses a significant technical challenge, and therefore renders the user geo-location inference task even harder.
Furthermore, it is herein recognized that IP addresses may be incorrect or may misrepresent a user due to IP routing and IP masking process provided by intermediary Internet services. Therefore, IP addresses, even if available, may not reflect the location of a user.
It is herein recognized that there are also different types of location associated with a user account, including Home Location, Current Location and Location(s) of Interest. The Home Location is a location that a user specifies while signing up (e.g. can be obtained from the user profile, such as Twitter user json). The Current Location is a location from which a user is currently sending a message (e.g. can be obtained from the user message if location services are activated, such as the Tweet jsons). The Location(s) of Interest are the locations of friends that a user follows (e.g. can be obtained from a Friends-Follower relationship graph). Identifying the true Home Location is very difficult, as users may prefer to purposely withhold this information.
It is herein proposed to infer geo-locations of social media users using self-disclosed locations of some users (herein referred to as seeds), social media relationships such as Follower and Friend, and the social media users content such as tweets, posts etc.
Below are some assumptions:
Geography, social relationship, and social contents are highly intertwined.
Relationships formed between people living in same geographical areas are carried over the Internet.
The geography and social environment that a person experiences dictates the online relationships he/she forms.
Social networking platforms include users who generate and post content for others to see, hear, etc (e.g. via a network of computing devices communicating through websites associated with the social networking platform). Non-limiting examples of social networking platforms are Facebook, Twitter, LinkedIn, Pinterest, Tumblr, blogospheres, websites, collaborative wikis, online newsgroups, online forums, emails, and instant messaging services. Currently known and future known social networking platforms may be used with principles described herein.
The term “post” or “posting” refers to content that is shared with others via social data networking. A post or posting may be transmitted by submitting content on to a server or website or network for other to access. A post or posting may also be transmitted as a message between two devices. A post or posting includes sending a message, an email, placing a comment on a website, placing content on a blog, posting content on a video sharing network, and placing content on a networking application. Forms of posts include text, images, video, audio and combinations thereof. In the example of Twitter, a tweet is considered a post or posting.
The term “follower”, as used herein, refers to a first user account (e.g. the first user account associated with one or more social networking platforms accessed via a computing device) that follows a second user account (e.g. the second user account associated with at least one of the social networking platforms of the first user account and accessed via a computing device), such that content posted by the second user account is published for the first user account to read, consume, etc. For example, when a first user follows a second user, the first user (i.e. the follower) will receive content posted by the second user. In some cases, a follower engages with the content posted by the other user (e.g. by sharing or reposting the content). A follower may also be called a friend.
In the proposed system and method, weighted edges or connections, are used to develop a network graph and several different types of edges or connections are considered between different user nodes (e.g. user accounts) in a social data network. These types of edges or connections include: (a) a follower relationship in which a user follows another user; (b) a re-post relationship in which a user re-sends or re-posts the same content from another user; (c) a reply relationship in which a user replies to content posted or sent by another user; and (d) a mention relationship in which a user mentions another user in a posting.
In a non-limiting example of a social network under the trade name Twitter, the relationships are as follows:
Re-tweet (RT): Occurs when one user shares the tweet of another user. Denoted by “RT” followed by a space, followed by the symbol @, and followed by the Twitter user handle, e.g., “RT @ABC followed by a tweet from ABC).
@Reply: Occurs when a user explicitly replies to a tweet by another user. Denoted by ‘@’ sign followed by the Twitter user handle, e.g., @username and then follow with any message.
@Mention: Occurs when one user includes another user's handle in a tweet without meaning to explicitly reply. A user includes an @ followed by some Twitter user handle somewhere in his/her tweet, e.g., Hi @XYZ let's party @DEF @TUV
These relationships denote an explicit interest from the source user handle towards the target user handle. The source is the user handle who re-tweets or @replies or @mentions and the target is the user handle included in the message. It will be appreciated that the nomenclature for identifying the relationships may change with respect to different social network platforms. While examples are provided herein with respect to Twitter, the principles also apply to other social network platforms.
To illustrate the proposed approach, consider the network graph in
Turning to
The server system 101A includes one or more processors 104. In an example embodiment, the server system includes multi-core processors. In an example embodiment, the processors include one or more main processors and one or more graphic processing units (GPUs). GPUs are typically used to process images (e.g. computer graphics), but they may also be used herein to process social data. For example, the social data is graph data (e.g. nodes and edges).
The server system also includes one or more network communication devices 105 (e.g. network cards) for communicating over a data network 119 (e.g. the Internet, a closed network, or both).
The server system further includes one or more memory devices 106 that store one or more relational databases 107, 108, 109 that map the activity and relationships between user accounts. The memory further includes a content database 110 that stores data generated by, posted by, consumed by, re-posted by, etc. users. The content includes text, images, audio data, video data, or combinations thereof. The memory further includes a non-relational database 111 that stores friends and followers associated with given users. The memory further includes a seed user database 112 that stores seed user accounts having known locations, and a geo-inference results database 113.
The memory 106 also includes a geo-inference application 114, a contextual similarity module 116, a geo-spatial similarity module 117, and a geo-inference module 118. In an example embodiment, the application 114 calls upon one or more of the modules 116, 117, and 118.
The server system 101A may be in communication with one or more third party servers 102 over the network 119. Each third party server having a processor 120, a memory device 121 and a network communication device 122. For example, the third party servers are the social network platforms (e.g. Twitter, Instragram, Snapchat, Facebook, etc.) and have stored thereon the social data, which is sent to the server system 101A.
The server system 101A may also be in communication with one or more user computer devices 103 (e.g. mobile devices, wearable computers, desktop computers, laptops, tablets, etc.) over the network 119. The computer device includes one or more processors 123, one or more GPUs 124, a network communication device 125, a display screen 126, one or more user input devices 127, and one or more memory devices 128. The computer device has stored thereon, for example, an operating system (OS) 129, an Internet browser 130 and a geo-inference application 131. In an example embodiment, the geo-inference application 114 on the server is accessed by the computer device 103 via the Internet Browser 130. In another example embodiment, the geo-inference application 114 is accessed by the computer device 103 via its local geo-inference application 131. While the GPU 124 is typically used by the computing device for processing graphics, the GPU 124 may also be used to perform computations related to the social media data.
It will be appreciated that the server system 101A may be a collection of server machines or may be a single server machine.
Turning to
It will be appreciated that the distribution of the databases, the applications and the modules may vary other than what is shown in
For simplicity, the example embodiment server systems 101A or 101B, or both, will hereon be referred to using the reference numeral 101.
As an initial step, the server system 101 obtains one or more seed user accounts (also called seeds or seed users) 400 from the database 112. In an example embodiment, the seed users accounts are those accounts in a social networking platform having known geographic locations. The database 112, for example, is a MYSQL type database.
The one or more seeds 400 are passed by the server system 101 into its geo inference application 114.
Responsive to receiving the seeds 400, the geo inference application 114 obtains followers (block 401) of one or more given seeds, and passes these followers to the geo-spatial similarity module 117. The followers, for example, are obtained by accessing the database 111, which for example is an HBASE database.
In this example implementation, an HBASE distributed Titan Graph database 111 runs on top of a Hadoop Distributed File System (HDFS) to store the social network graph (e.g., in a server cluster configuration comprising fifteen server machines). In other words, in an example implementation, the server machines 303 comprises multiple server machines that operate as a cluster.
The seeds 400 and the followers are passed to the geo-spatial similarity module 117, and in response the geo-spatial similarity module obtains common friends of each seed-follower pair (block 404).
The geo-spatial similarity module 117 computes one or more geo-spatial similarity scores between a given seed user account and a given subject user. A subject user herein refers to a user account that has an unknown location, or has one or more locations that are being verified. The subject user may also be a friend or follower of one or more of the seed users, and at the very least the subject user shares common friends or followers with one or more of the seed users. For example, in
In the example embodiment, responsive to receiving the seeds 400, the application 114 further accesses the database 110 to obtain posts (e.g. Tweets) from the seed users and a given subject user, and passes these posts to the contextual similarity module 116 to compute a textual similarity score between the subject user and the one or more seed users. In an example embodiment, the text of the posts are compared to determine if the content produced by the users are the similar or relate to the same topics.
In another example embodiment, text, images, video, audio data, or combinations thereof are compared with each other to determine if the content is the same or relate to each other. For images and video data, this comparison includes pattern recognition and image processing. For audio data, this comparison includes pattern recognition and audio processing. The comparison process may also include using Deep Learning computations to obtain feature vectors, and to compare the feature vectors to each other.
In this example implementation, the content database 110 is a SOLR type database. SOLR is an enterprise search platform that runs as a standalone full-text server 302. It uses the Lucene Java search library as its core for full-text indexing and search.
Furthermore, responsive to receiving the seeds 400, the application 114 further accesses one or more of the relational databases 107, 108, 109 to determine the activity service of the seeds and the subject user. The activity service includes the replies, repost, posts, mentions, follows, likes, dislikes, etc. between the subject user and the one or more seed users, and is used by the contextual similarity module 116 to determine an engagement score.
In this example embodiment, the databases 107, 108, 109 are respectively a HIVE database, a MYSQL database and a PHOENIX database. HIVE is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. MYSQL is a relational database management system. PHOENIX is a massively parallel, relational database layer on top of noSQL stores such as Apache HBase. Phoenix provides a Java Database Connectivity (JDBC) driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; upsert and delete rows singly and in bulk; and query data through SQL.
The contextual similarity module 116 computes a contextual similarity score using the engagement score. In another example embodiment, the contextual similarity score is computed using both the engagement score and the textual similarity score.
The contextual similarity module 116 passes the contextual similarity score to the geo inference module 118, and the geo-spatial similarity module 117 passes the geo-spatial similarity score to the module 118.
Responsive to receiving these scores, the geo-inference algorithm determines an inferred location of the subject user, and stores the inferred location result in the database 113.
The inferred location result may be used to update the locations of the subject user in other databases, including but not limited to the seed database 112.
In an example embodiment, the server system 101 does not use the contextual similarity module 116, and relies on the computations and data related to the geo-spatial proximity similarity to infer the location of the subject user. Example executable instructions for this process are shown in
In
At block 502, the server system 101 converts the text-based location into numerical data representing latitude and longitude coordinates. This numerical data is stored in the seed user database 112 in memory (block 503).
At block 504, the server system accesses the memory device that stores the seed user database 112 to retrieve and obtain seed users and their known latitude and longitude coordinates.
At block 505, the server system identifies a given seed user and a given subject user.
At block 506, the server system accesses the memory device storing the database 111 to obtain friends or followers, or both, that are common to both the given seed user and the given subject user.
At block 507, the server system partitions the friends or followers, or both, into buckets based on location. For example, there are: a “Toronto bucket”, a “Los Angeles bucket”, and a “New York bucket”.
At block 508, for each location bucket, the server system determines a geo-spatial similarity score for the given subject user. In other words, the subject user will have a geo-spatial similarity score for the Toronto bucket, a geo-spatial similarity score for the Los Angeles bucket and a geo-spatial similarity score for the New York bucket. The geo-spatial similarity score may be based on the number of friends or followers, or both, that the subject user has in a given location bucket. The geo-spatial similarity score, for example, is computed using the numerical distances between the seed user and the users in a given location bucket, and normalizing the value by the number of users within that location bucket. For example, when working with numerical distances, it is considered that if a subject user shares a lot of common friends with a seed user from a given location, then the subject user is most likely from the same geographic location as the seed user.
In another example embodiment, instead of a geo-spatial similarity score, the server system can use the information obtained from the location buckets to perform a K-Nearest Neighbor computation to directly identify the location of the subject user. In other words, the location of the subject user is classified based on its proximity to the K-nearest user accounts on a social graph, and the locations of those K-nearest user accounts. For example, the server system computes a linear combination of contextual similarity and social proximity of the subject user to the seed users on the social network graph, and executes a K-Nearest neighbour computation on that. It will be appreciated that K is a natural number.
The geo-social-spatial dimension allows the server system 101 to delimit the geographical area between any two users' known locations and thereby to determine how many of the two users' common followers/friends live within that delimited geographical area. The main idea here is that the likelihood of friendship with a person increases if that person and us have common friends that live in the same area. Conversely, this likelihood decreases with distance given that the further that distance is the less likely we are to interact with friends we have in common with that person. In other words, distance also affects the way that social relationship persists over time.
Continuing with
At block 510, server system stores the inference result (e.g. the inferred location) in memory. At block 511, the server system updates one or more databases using the inference result, for example, as feedback into the server system.
The operations of blocks 501 to 508 are performed. At block 607, which follows block 508, the server system stores the geo-spatial similarity scores for the different location buckets in memory.
Following block 505, at block 601, the server system 101 also accesses the memory device storing the content database 110 to obtain content produced by, posted by, consumed by, or combinations thereof, the given seed user and the given subject user.
At block 602, the server system processes the content to determine a textual similarity score between the given seed user and the given subject user. For example, text from the posts in the database 110 are compared. Other types of comparisons may be made if the content is in other formats (e.g. images, video, audio, etc.). There are several ways to compute a textual similarity score. Two non-limiting examples are Levenshtein distance and mean squared error distance.
At block 603, the server system stores the textual similarity score in memory.
At block 604, the server system accesses the memory devices storing the relational databases 107, 108, 109 and the content database 110 to determine the activities amongst the users and to, therefore, determine an engagement score between the given seed user and the given subject user. In an example embodiment, the engagement score between a subject user and a seed user is computed as the total number of tweets of the seed user that are retweeted, @Mentioned or liked by the subject user divided by the total number of activities of the subject user on Twitter in a given time frame.
At block 605, the server system stores the engagement score in memory.
At block 606, the server system computes a contextual similarity score using the textual similarity score or the engagement score, or both. In an example embodiment, only the engagement score is used to compute the contextual similarity score.
At block 608, which follows block 607 and block 606, the server system uses the obtained geo-spatial similarity scores and the contextual similarity score to determine an inferred location for the given subject user. For example, the K-nearest neighbor is used to determine the location. In another example embodiment, the geo-spatial similarity scores are used to weight the edges between the subject user and the one or more seed users. In an example embodiment, for a given subject user, a final similarity score to every seed user is computed as a linear combination of the contextual score and the social proximity between the two, and then K-nearest neighbour is executed by the server system on the resulting weighted graph to find the seed user that is closest to the given subject user. The location of that seed user is prescribed as the most probable location of the given subject user.
Turning to
It will also be appreciated that the operations of blocks 701 to 704 may be performed as part of block 501.
Another example embodiment of executable instructions for identifying seed users is shown in
Step 1 (block 801): Go through the Twitter data for the past D (e.g., D=30) days and get tweets with location from the twitter API (if it exists). Collect all such tweets/retweets.
Step 2 (block 802): For each tweet/retweet found in (step 1):
Step 3: For each author A found in (step 2b):
Step 4 (block 807): Return and save the USER_LOCATION and CURRENT_LOCATION files.
Step 5 (block 808): Load CURRENT_LOCATION into Database (e.g. the PHOENIX database), and then delete CURRENT_LOCATION file.
After the process of
Therefore, turning to
Step 1 (block 901): If the highest probability of A's being at any place is greater than γ1 (e.g., γ1=0.79) and A has more than T (e.g., T=10) tweets in the USER_LOCATION file, add A to the seed set S.
Step 2 (block 902): Delete the supernodes from the list of seeds. This can be done by looking up the seeds in the Supernodes table (e.g. stored in the MySQL database). Typically, supernodes are those nodes that have lots of followers. Non-limiting examples include Justin Bieber's Twitter user account, or the U.S. President's Twitter user account. In an example embodiment, supernodes are nodes that have more than 10 million followers.
Step 3 (block 903): For all the remaining seeds, get all <Seed, Follower of that seed>relationships by accessing a database (e.g. the HBase database).
Step 4 (block 904): Reverse all the relationship pairs to get FOLLOWER_TO_SEEDS pairs <Follower, List of Seeds>. In an example embodiment, the purpose of reversing the SeedToFollower list to the FollowerToSeed list is to be able to compute the location probabilities of each follower from the information of their seed friends in an independent and parallel way. For example, the computation is done via Spark, a trade name for a cluster computing framework.
Step 5 (block 905): For each FOLLOWER_TO_SEEDS u, execute the following:
Step 6 (block 906): Seed Expansion: For all followers of all seeds for whom the server system have predicted their geographic locations in steps 1-5, determine the ones for whom the highest probability of being at any place is greater than γ2 (e.g., γ2=0.69) and who have at least L (e.g., L=5) seed friends, and add them to the seed set (also called the “Expanded seed set”).
Step 7 (block 907): For all users in Expanded seed set, execute the operations in steps 2-5.
Step 8 (block 908): For each user the server system have thus processed do the following:
Step 9 (block 909): Load GEO_RESULT into Database (PHOENIX), and then delete GEO_RESULT file.
Using the operations in
In an example experiment, the server system was provided with an input comprising a dataset of 2900 Twitter users with known physical locations (e.g. latitude and longitude). In the table shown in
It will be appreciated that the systems and methods described herein do not need to use IP addresses, or to access servers storing IP addresses, in order to obtain location data. In some cases where IP addresses are inaccurate or do not correctly represent a user, then the systems and methods described herein are able to still accurately infer a user's location.
The systems and methods described herein rely on the social network relationship data stored in the databases, which are more readily available and accessible.
The systems and methods described herein also may be used to continuously (e.g. the processes are performed repeatedly). In this way, the server system is able to identify that a subject user has moved or changed location, even if the subject user's profile has not been updated to reflect their new location. For example, the server system stores a date tag associated with each inference result in the database 113. The server system uses the date tag to compare how the inference results for a given subject user change or remains the same over time. For example, temporary changes in location may be filtered out.
Furthermore, in cases when a subject user has listed on their profile multiple locations, the server system is able to identify the primary location for the subject user.
In a general example embodiment, a system and method are provided to compute contextual similarity. This includes, for example, computing content similarity between seed users and followers/friends, as well as computing an engagement score between seed users and followers/friends. The system also computes geo-social-spatial similarity. The similarity scores are used in any inference computation to infer the geo-locations of the followers of the seed users, and subject users who share common friends with the seed users. The user geo-location inference database is updated using the result. Other seed users are selected, and the process is repeated.
Below are additional general example embodiments and related aspects.
In a general example embodiment, a server system for inferring a location for a subject user is provided. It includes: a communication device configured to communicate with a data network; one or more memory devices storing a seed user database, a database storing friends and followers of users within a social data network, and a geographic inference application; and one or more processors. These one or more processors are configured to at least: access the one or more memory devices to obtain from the seed user database a seed user having a known location in text format; use the geographic inference application to convert the known location into numerical coordinates; access the one or more memory devices to identify, from the database storing friends and followers of users, friends and followers common to both the seed user and a subject user, the subject user having an unknown location and the friends and followers having known locations; use the geographic inference application to partition the friends and followers into location buckets; for each location bucket, use the geographic inference application to determine a geo-spatial similarity score; use the geographic inference application to identify the location bucket with a highest geo-spatial similarity score and establish the location of that location bucket as an inferred location of the subject user; and store the inferred location in the one or more memory devices.
In an example aspect, the one or more processors are further configured to populate the seed user database by at least: identifying user accounts in the social data network that have transmitted messages at least x times in the last y days with their respective location service activated, where x and y are natural numbers; identifying a subset of the user accounts that each one have transmitted a majority of messages in the last y days from one respective location; and storing the subset of the user accounts as seed users.
In another example aspect, the one or more processors are further configured to populate the seed user database by at least: computing multiple probabilities respectively associated with multiple locations, the multiple locations associated with a given user account, and the multiple probabilities including a highest probability associated with a certain one of the multiple locations; responsive to determining that the highest probability is above a threshold probability, storing the given user account and the certain one of the multiple locations in the seed user database.
In another example aspect, the seed user database includes multiple seed users, including the seed user and supernode seeds, wherein the supernode seeds have more than a threshold number of followers, and the one or more processors are configured to delete the supernode seeds from the seed user database.
In another example aspect, the database storing friends and followers of users is an HBASE database implemented on multiple server machines that operate as a cluster.
In another example aspect, the one or more processors are configured to compute each one of the known locations of the friends and followers independently and in parallel using a cluster computing framework.
In another example aspect, the inferred location is stored with a date tag, and subsequent inferred locations associated with the subject user are stored with respective date tags.
In another example aspect, the geo-spatial similarity score is computed using at least numerical distances between the seed user and each of the friends and followers in a given location bucket, and a number of the friends and followers in the given location bucket.
In another general example embodiment, a server system for inferring a location for a subject user is provided. The server system includes: a communication device configured to communicate with a data network; one or more memory devices storing at least a seed database and a database storing a graph network of followers of users in a social data network, and a geographic inference application; and one or more processors. These one or more processors are configured to at least: find user accounts in a social data network that have transmitted messages at least x times in the last y days, each of the messages having location data; compute current locations from the messages; store the user accounts that have transmitted the majority of the messages from one location as seeds in the seed database; access the seed database and the database storing the graph network to retrieve the current locations of the seeds and subsequently compute the locations of the followers of the seeds.
In an example aspect, the location data comprise text data of a city name, or country name or both, and the computed current locations comprise numeric latitude and longitude coordinates.
In another example aspect, the database storing the graph network of followers is an HBASE database implemented on multiple server machines that operate as a cluster.
In another example aspect, the seed user database includes multiple seeds, including supernode seeds, wherein the supernode seeds have more than a threshold number of followers, and the one or more processors are configured to delete the supernode seeds from the seed user database, and remaining seeds in the seed user database are used to compute the locations of the followers of these remaining seeds.
In another example aspect, the one or more processors are configured to compute the locations of followers of the seeds independently and in parallel using a cluster computing framework.
In another example aspect, each of the locations of the followers of the seeds are stored with a date tag, and subsequent computed locations of the same followers are stored with respective date tags.
In another example aspect, the one or more processors are configured to use the date tags of a given follower to determine if the given follower's location changes over time or remains the same.
In another example aspect, temporary changes in the given follower's location are filtered out.
In another general example embodiment, one or more non-transitory computer readable mediums are provided that store a seed user database, a database storing friends and followers of users within a social data network, and a geographic inference application. The one or more non-transitory computer readable mediums further include executable instructions for inferring a location for a subject user, and the executable instructions, when executed, causing a server system to at least: obtain from the seed user database a seed user having a known location in text format; use the geographic inference application to convert the known location into numerical coordinates; identify, from the database storing friends and followers of users, friends and followers common to both the seed user and a subject user, the subject user having an unknown location and the friends and followers having known locations; use the geographic inference application to partition the friends and followers into location buckets; for each location bucket, use the geographic inference application to determine a geo-spatial similarity score; use the geographic inference application to identify the location bucket with a highest geo-spatial similarity score and establish the location of that location bucket as an inferred location of the subject user; and store the inferred location.
In another general example embodiment, one or more non-transitory computer readable mediums are provided that store at least a seed database and a database storing a graph network of followers of users in a social data network, and a geographic inference application. The one or more non-transitory computer readable mediums further include executable instructions for inferring a location for users in a social data network, and the executable instructions, when executed, causing a server system to at least: find user accounts in the social data network that have transmitted messages at least x times in the last y days, each of the messages having location data; compute current locations from the messages; store the user accounts that have transmitted the majority of the messages from one location as seeds in the seed database; and access the seed database and the database storing the graph network to retrieve the current locations of the seeds and subsequently compute the locations of the followers of the seeds.
It will be appreciated that any module or component exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the computing systems described herein or any component or device accessible or connectable thereto. Examples of components or devices that are part of the computing systems described herein include server machines and computing devices. Any application or module herein described may be implemented using computer readable/executable instructions that may be stored or otherwise held by such computer readable media.
It will be appreciated that different features of the example embodiments of the system and methods, as described herein, may be combined with each other in different ways. In other words, different devices, modules, operations and components may be used together according to other example embodiments, although not specifically stated.
The steps or operations in the flow diagrams described herein are just for example. There may be many variations to these steps or operations without departing from the spirit of the invention or inventions. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified.
Although the above has been described with reference to certain specific embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the scope of the claims appended hereto.
This application claims priority to U.S. Provisional Patent Application No. 62/347,846 filed on Jun. 9, 2016, entitled “Prediction System for Geographical Locations of Users Based on Social and Spatial Proximity, and Related Method” and the entire contents of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62347846 | Jun 2016 | US |