Mining Correlation Between Locations Using Location History

Information

  • Patent Application
  • 20110208425
  • Publication Number
    20110208425
  • Date Filed
    February 23, 2010
    15 years ago
  • Date Published
    August 25, 2011
    13 years ago
Abstract
Techniques describe determining a correlation between identified locations to recommend a location that may be of interest to an individual user. The process constructs a location model to identify locations. To construct the model, the process uses global positioning system (GPS) logs of geospatial locations collected over time and identifies trajectories representing trips of the individual user and extracts stay points from the trajectories. Each stay point represents a geographical region where the individual user stayed over a time threshold within a distance threshold. A location history is formulated for the individual user based on a sequence of the extracted stay points to identify locations.
Description
BACKGROUND

A global positioning system (GPS) tracking unit identifies a location or tracks a movement of a vehicle or a person when the vehicle or the person is in close proximity to a GPS device. The location or movement is recorded via GPS devices or phones. GPS information is utilized in navigation systems. For example, individuals may search for information based on their present GPS location for driving or walking directions to a destination location.


The increasing popularity of location-acquisition technologies and their use in people's lives results in GPS information being collected daily. The data collection includes tracking movements of people or vehicles and their visits to various locations. The GPS data may be uploaded to the Internet by people to show their positions, to share travel experiences, and for a variety of other reasons.


The GPS data in raw form is not usable for a number of reasons. One problem with the data in raw form is that there is no semantic meaning to identify the data. For example, there is no indication of whether the location data is for a lake, a restaurant, or a store.


Another problem occurs when individuals enter a building, which causes a lost of a satellite signal. This loss of the satellite signal makes it difficult to identify whether to include the information.


There is an increasing opportunity to find ways to transform the raw data to a usable form and to use the data collected.


SUMMARY

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.


This disclosure describes determining a correlation between locations to recommend a location that may be of interest to an individual user. The recommendation is based on location history of individual users. In one aspect, a location correlation service constructs a location model to identify locations with a time-stamp. To construct the model, the location correlation service uses global positioning system (GPS) logs of geospatial locations collected over time. The location correlation service identifies trajectories representing trips of the individual user and extracts stay points from the trajectories. Each stay point represents a geographical region where the individual user stayed over a time threshold within a distance threshold. A location history is formulated for the individual user based on a sequence of the extracted stay points to identify locations.


In another aspect, a location correlation service determines a correlation between identified locations. The location correlation service accesses the location model to identify locations. The location correlation service integrates travel experiences of individual users who have visited the locations in a weighted manner and identifies a common travel sequence which the individual users followed between the locations. Then, the location correlation service calculates the correlation between the identified locations. The correlation recommends locations that may be of interest to other users.





BRIEF DESCRIPTION OF THE DRAWINGS

The Detailed Description is set forth with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.



FIG. 1 illustrates an architecture to support an exemplary environment for recommending a location to a user.



FIG. 2 is a flowchart showing an exemplary of mining correlation between locations, including constructing a location model, inferring each individual user's travel experiences, calculating a correlation between locations, and providing a recommendation.



FIG. 3 is a flowchart showing an exemplary process of constructing the location model.



FIG. 4 illustrates an exemplary process of extracting stay points.



FIG. 5 illustrates an exemplary process of clustering stay points.



FIG. 6 is a flowchart showing an exemplary process of inferring travel experiences of users from their location histories.



FIG. 7 illustrates an exemplary inference model used in the process of FIG. 6.



FIG. 8
a illustrates a flowchart showing an exemplary process of determining correlations between locations based at least in part on the location histories.



FIG. 8
b illustrates an exemplary process showing correlation between the locations.



FIG. 9 illustrates an exemplary process of calculating the correlation between the locations.



FIG. 10 is a block diagram showing an exemplary location correlation server usable with the environment of FIG. 1.





DETAILED DESCRIPTION
Overview

This disclosure describes identifying a correlation between locations to recommend a location that may be of interest to an individual user. The recommendation is based at least in part on recorded location histories. In one aspect, the location correlation service constructs a location model to identify the locations. The model is constructed by processing global positioning system (GPS) points that tracked the individual user. The tracking of individual users may be made possible through mobile phones via a global system for mobile communications (GSM) network, which leaves positioning logs with a timestamp of each log point. Furthermore, if there is exposure to a GPS satellite, GPS-enabled devices may record latitude and longitude positions. The logs may also be obtained from geo-related web communities, websites, or forums. For sake of brevity, GPS logs may be used as examples in the discussion, but the data may additionally or alternatively include other location data such as from GSM networks, personal area networks, and the like. The individual users will be given notice of the GPS data collection and have the opportunity to provide or to deny consent for tracking purposes. For example, the individual users may choose to opt-in consent or to opt-out consent.


As previously mentioned, the GPS data is not usable in its raw form. The location correlation service described herein identifies trajectories from the GPS logs to transform the raw data into a usable form to construct the location model. The trajectories represent a single trip for the individual user based on a sequence of time-stamped points. Next, the process extracts stay points from the trajectories. Each stay point represents a geographical region where the individual user has stayed for a predetermined time interval. Thee process described herein defines a particular semantic meaning for the stay points, such as identifying the stay point as a shopping mall or a restaurant.


Then, the process formulates the individual user's location history based on a sequence of stay points. The individual user's location history data is clustered into clusters to remove a top two clusters of stay points having a greatest number of stay points. Removing the top two closers of stay points eliminates geographical regions that are private to the individual user, such as the user's home or office.


Furthermore, the location correlation service clusters the stay points from multiple users' trajectories into several geographical regions to identify locations. The clustering is based on a density-based clustering algorithm. Thus, the location model provides valuable meaning to the geographical regions that have been visited by multiple individual users, such as, a set of restaurants, stores along a main street, or an area for tourist attractions.


In another aspect, the location correlation service determines a correlation between identified locations. A location correlation service may access the inference model to infer individual users' travel experiences from their location histories. Based on the model, the process integrates travel experiences of individual users for the locations and identifies a common travel sequence followed between the locations. Then, the location correlation service calculates the correlation between the identified locations. The correlation indicates a relationship between the locations based on human behavior. The location correlation service recommends a location that may be of interest to the user based on the location histories of other users.


While aspects of described techniques can be implemented in any number of different computing systems, environments, and/or configurations, implementations are described in the context of the following exemplary computing environment.


Illustrative Environment


FIG. 1 illustrates an exemplary architectural environment 100, usable to recommend locations that may be of interest to users, based on a correlation between identified locations from a location model. The environment 100 includes an exemplary computing device 102, which is illustrated as a personal digital assistant (PDA). The computing device 102 is configured to connect via one or more network(s) 104 to access a location correlation service 106 for a user 108. The computing device 102 may take a variety of forms, including, but not limited to, a portable handheld computing device (e.g., a personal digital assistant, a smart phone, a cellular phone), a personal navigation device, a laptop computer, a desktop computer, a portable media player, or any other device capable of connecting to one or more network(s) 104 to access the location correlation service 106 for the user 108.


The network(s) 104 represents any type of communications network(s), including wire-based networks (e.g., public switched telephone, cable, and data networks) and wireless networks (e.g., cellular, satellite, WiFi, and Bluetooth).


The location correlation service 106 represents an application service that may be operated as part of any number of online service providers, such as a search engine, map service, social networking site, or the like. Also, the location correlation service 106 may include additional modules or work in conjunction with modules to perform the operations discussed below. In an implementation, the location correlation service 106 may be implemented at least in part by a location application stored in memory of the computing device 102, by an application stored on servers of the location correlation service 106, or both. Updates may be sent for the location application stored on a personal navigation device.


In the illustrated example, the computing device 102 may include a location correlation user interface (UI) 110 that is presented on a display of the computing device 102. The user interface 110 facilitates access to the location correlation service 106 that provides recommendations. In one implementation, the UI 110 is a browser-based UI that presents a page received from the location correlation service 106. The user 108 employs the location correlation UI 110 when viewing a map of a region of interest. The UI 110 may also allow for input of the region of interest by viewing the map. In another implementation, the UI 110 may request and receive input for the region of interest. In an implementation, the location correlation service 106 recommends a location of interest based on the user's present geospatial position. For example, the user interface 110 may display a place of interest, such as “Potomac Overlook Regional Park” to the user 108, based on the user's present geospatial position, a prediction of the user's interest in a location, locations within a threshold, travel time, locations within a predetermined distance from the user's present geospatial location, and/or location histories of other users. The user will be given notice of the GPS tracking their position or location and have the opportunity to provide or to deny consent for tracking purposes. For example, the user may choose to opt-in consent or to opt-out consent.


In the illustrated example, the location correlation service 106 is hosted on one or more location correlation servers, such as server 112(1), 112(2), . . . , 112(S), accessible via the network(s) 104. The location correlation servers 112(1)-(S) may be configured as plural independent servers, or as a collection of servers that are configured to perform larger scale functions accessible by the network(s) 104. The location correlation servers 112 may be administered or hosted by a network service provider that provides the location correlation service 106 to and from the computing device 102.


The location correlation service 106 further includes a location correlation application 114 that executes on one or more of the location correlation servers 112(1)-(S). In an implementation, the location correlation application 114 builds a location model to identify locations, in order to utilize GPS data.


To create the location model, the location correlation application 114 may preprocess the individual user data by collecting global positioning system (GPS) logs. To identify effective individual trips in the geographical locations, the location correlation application 114 identifies or parses trajectories from the logs. The trajectories help transform the raw GPS data to a usable form. The trajectory data is extracted to identify stay points. The extraction of the stay points involves identifying a stay point, which is a geographical region where the individual user has stayed over a time threshold within a distance threshold. The location correlation application 114 helps identify whether to use the stay point as absolute time or to calculate time intervals and associates semantic meaning to the stay points (e.g., whether it is a store or a restaurant). The location correlation application 114 also specifies a location history for the individual user based on a sequence of stay points with corresponding arrival times and departure times. This data is particularly valuable in understanding human behavior.


The location correlation application 114 clusters the stay points based on geographical regions to form clusters of stay points. Then, the location correlation application 114 removes a top two clusters of stay points having a greatest number of stay points to eliminate the geographical regions that are private to the individual user. For example, the location correlation application may remove clusters associated with the user's home and office locations.


Furthermore, the location correlation application 114 groups the stay points from multiple users' trajectories into a dataset and clusters the stay points into several geographical regions. The clusters of stay points from the multiple users' trajectories are used to represent locations. The locations may be further grouped into a trip, which is a sequence of locations that are consecutively visited by the individual user.


After the location model has been constructed, the location correlation service 106 is ready to infer the travel experiences of the individual users based on their location histories. The location correlation service 106 may employ an inference model to evaluate the travel experiences of the individual user. Individual travel experience and location interest have a mutual reinforcement relationship. For example, an individual user with rich travel experiences in a region would visit many interesting places in the region, and a very interesting place in that region may be accessed by many individual users with rich travel experiences. To calculate each individual user's travel experience, the location correlation service 106 builds a matrix for location and user and uses a power iteration method to calculate the travel experiences.


Next, the location correlation service 106 may access the location model to infer individual users' travel experiences from their location histories. Using the model, the process integrates travel experiences of the individual users for the locations and identifies a common travel sequence followed by the individual users between the locations. Then, the location correlation service 106 calculates the correlation between the identified locations. The correlation indicates a relationship between the locations based on human behavior. Based on the correlations, the location correlation service 106 recommends a location to the user.


In the illustration, the user 108 accesses the location correlation service 106 via the network 104 using their computing device 102. The location correlation service 106 presents the user interface (UI) 110 to receive a user query for a location of interest or to provide a recommendation for the location of interest. In an implementation, the user 108 accesses a map for a particular region. Upon activating the particular region on the map, the location correlation service 106 may provide recommendations of locations of interest to the user 108, based on the location correlation results.


In the example illustrated in FIG. 1, the user 108 may receive a location of “Potomac Overlook Regional Park” based on his present geospatial location. Once the location is recommended, the user may submit a query by actuating a button “Find Similar Locations” on the UI 110. Based on the user query, the location correlation application 114 searches the correlation results to find another location.


The environment 100 may include a database 116, which may be stored on a separate server or the representative set of servers 112 that is accessible via the network(s) 104. The database 116 may store information, such as logs for the individuals which include a sequence of global positioning system (GPS) points, a trajectories archive, location models, locations identified by the model, a map generated of locations visited, mined location correlation results, and the like. In this implementation, the location model and the location correlation results are stored in the database 116 and are updated on a predetermined time interval.



FIG. 2 is a flowchart showing an exemplary process 200 showing high level functions performed by the location correlation service 106. The process 200 may be divided into four phases, an initial phase to construct a location model 202, a second phase to infer each users' travel experiences 204, a third phase to calculate a correlation between the locations 206, and a fourth phase to recommend a location 208. The phases may be used in the environment of FIG. 1. These phases may be performed separately or in combination.


The first phase is constructing the location model of each individual user's location history 202. The process collects GPS logs of geospatial locations of the individual user. This disclosure describes transforming the GPS data into a form that may be readily used to construct the location model. For example, the location model may be based on identifying trajectories and stay points from logs, associating location histories and locations from stay points, identifying trips and users. The location model may be constructed by the location correlation service 106 or in conjunction with a location model module. Additional details of constructing the location model of each individual user's location history 202 can be found in the discussion of FIGS. 3-5 below.


The second phase, inferring each individual user's travel experience in a given region 204, is performed using an inference model. The process builds an adjacent matrix between the individual users and locations of interest for locations visited by the individual user. The individual user's travel experience and the locations of interest have a mutual reinforcement relationship. Thus, a power iteration process calculates each individual user's travel experience and each location of interest to be used as input for a correlation. Additional details of inferring the travel experiences by using individual user data 204 can be found in the discussion of FIGS. 6 and 7 below.


The third phase, calculating the correlation between the locations by integrating travel experiences 206, uses a location correlation algorithm. The correlation takes into consideration the user's travel experiences and a sequence of the locations in the individual user's trip. Furthermore, the correlation is based on category similarity and the geographical distance between the locations. This information may be stored in the database 116 for easy access by the location correlation service. Additional details of correlating locations by integrating the travel experiences of the users 206 can be found in the discussion of FIG. 8a below.


The fourth phase is to provide a recommendation for a location 208 based on the correlation data between locations. The recommendation may occur when the user is accessing a map of the region, accessing websites, submitting a query, or based on the user's geospatial location. Additional details of recommending a place of interest 208 can also be found in the discussion of FIG. 8b below.


Exemplary Processes


FIGS. 3, 6, and 8a are flowcharts showing exemplary processes for constructing the location model of each individual's location history 202, inferring users' travel experiences from their location histories 204, and calculating a correlation between the locations by integrating the travel experiences of the users 206, respectively. The processes are illustrated as a collection of blocks in logical flowcharts, which represent a sequence of operations that can be implemented in hardware, software, or a combination. For discussion purposes, the processes are described with reference to the computing environment 100 shown in FIG. 1. However, the processes may be performed using different environments and devices. Moreover, the environments and devices described herein may be used to perform different processes.


For ease of understanding, the methods are delineated as separate steps represented as independent blocks in the figures. However, these separately delineated steps should not be construed as necessarily order dependent in their performance. The order in which the process is described is not intended to be construed as a limitation, and any number of the described process blocks maybe be combined in any order to implement the method, or an alternate method. Moreover, it is also possible for one or more of the provided steps to be omitted.



FIG. 3 is a flowchart illustrating an exemplary process 300 of preprocessing raw GPS data to model each individual user's location history with time-stamped locations. The process 300 constructs a location model by collecting or receiving GPS logs of geospatial locations of individual users 302. The logs may be obtained from GPS sensors, tracking units, mobile phones, or any other device, as long as these devices are located in close proximity to each of the individuals. The GPS log is generally a collection of GPS points, which include a date, a time, a longitude, and a latitude. A GPS log may include a set P of points represented by P={p1, p2, . . . , pn}, where each GPS point pi ∈ P contains latitude (pi.Lat), longitude (pi.Lngt) and timestamp (pi.T) values, such as a date and a time.


In an implementation, the location correlation service 106 may obtain GPS logs from GPS-log driven applications on the web. Each individual user may be equipped with a GPS device for tracking data. The device may include a GPS navigation device, a GPS phone, or any other type of GPS sensor that collects GPS log data at a high sampling rate, such as every two to eight seconds per point. The GPS devices may be set to automatically track the position of the device at regular intervals.


As mentioned, a problem with raw GPS data is that it is not in a usable form. In the examples described herein, the raw data from the GPS logs is first transformed into a form that may be readily used to construct the location model. Modeling includes parsing the GPS logs of each individual user to identify trajectories 304. The trajectories are identified from the GPS logs to provide a representation of individual trips for the individual user.


An individual user's trajectory is a sequence of time-stamped points. The trajectory may be represented by:

  • Traj=(p0, p1, . . . , pk), where pi(xi, yi, ti) (i=0, 1, . . . , k) is a timestamp ∀0≦i<k, ti<ti+1, and (xi, yi) are two-dimension coordinates of points.


In building the location model, the process extracts stay points from the trajectory of each individual user 306. Each stay point gives semantic meaning to the raw point in the trajectory. For example, the stay point may indicate a store, a restaurant, a school, and the like. The stay point s, represents a geographical region where the individual user stayed over a time period. The time period may be based at least in part on a time threshold (Tr) within a distance threshold (Dr). In one specific implementation, the time threshold is 20 minutes and the distance threshold is 250 meters. However, in other implementations, other time and distance thresholds may be used. Based on data for the individual user, such as staying over 20 minutes and a distance of 200 meters, the stay point is identified as a geographical region. However, if the data for the individual user indicates that the user stayed about 10 minutes within a distance of 190 meters, there is no stay point detection. In that case, for example, the individual user may be at a street crossing waiting for traffic lights.


In the individual user's trajectory, stay point s, is characterized by a set of consecutive points:






P=(pm, m+1, . . . , pn),


where ∀m<i≦n, Dist(pm, pi)≦Dr, Dist(pm, pn+1)>Dr and Int(pm, pn)≧Tr. Therefore, s=(x, y, ta, tl), where:






s,x=Σ
i=m
n
p
i
·x/|P|,






s,y=Σ
i=m
n
p
i
·y/|P|.


respectively stands for the average x and y coordinates of the collection P; s·ta=pm·tm is the individual user's arriving time on s and s·tl=pn·tn represents the individual user's leaving time. A diagram illustrating the GPS log and stay points are shown in FIG. 4.


When stay points are identified, a sequence of stay points is formulated to represent a location history of the individual user 308. Each stay point corresponds to a location visited by the individual user with corresponding arrival and departure times. The individual user's location history, h, is represented by:









h
=





s
0





Δ






t
1






s
1





Δ






t
2





,





,




Δ






t

n
-
1






s
n









(
1
)







where ∀0≦i<n, si is a stay point and Δti=si+1·ta−si·tl is the time interval between two stay points.


In addition, the stay points of each individual user are clustered to form clusters of the stay points 310. The clusters of the stay points of the individual users are further filtered. The filtering removes the top two clusters having the greatest number of stay points, from the clustering results of the individual user. The filtering protects the individual user's privacy, such as removing their home and workplace from the cluster of stay points. The stay points are reclustered after the removal. After the clustering of the stay points to form clusters, the process transforms the individual stay point sequence into a location history sequence. Each stay point is substituted by the cluster it pertains to, with arrival and departure times of the stay point retained and associated with the cluster. A diagram of the clustering of stay points of individual users is shown in FIG. 5.


In some instances, the location histories of the individual users may tend to be inconsistent, as the stay points detected from various individual users' trajectories are not identical. To address this inconsistency, the stay points that are identified from all of the individual user's trajectories are grouped into a dataset S and clustered 312. Thus, the stay points from multiple individual users' are clustered into clusters of several geographical areas 312 by a clustering algorithm.


The clustering uses a density-based clustering algorithm, such as Ordering Points To Identify the Clustering Structure (OPTICS), to cluster the individual user stay points and to cluster the multiple users' stay points into clusters of geospatial regions. OPTICS may detect clusters with irregular structures, such as a shopping street or a set of nearby restaurants. This approach helps filter out few sparsely distributed stay points, and to ensure that, each cluster has been accessed by multiple users.


The two parameters used in OPTICS are a core-distance (dc) and a minimum number of points (minPt) falling in this core-distance. The OPTICS algorithm clusters the geographical regions into clusters by grouping and identifying similar places visited by the individual users. For example, stay points of the same place are directly clustered into a density-based cluster. However, clusters with valuable semantics and irregular structures may also be detected by using OPTICS clustering method, such as a set of restaurants or travelling areas near a lake. In response to the stay points of multiple users' being clustered together into a cluster, geographical regions are identified 314 by the location correlation service 106. The stay points that are similar in coordinate location or type of classification from the multiple users may be assigned to a same cluster. These geographical regions are identified to be used for correlation.


The cluster of stay points detected from the multiple users' trajectories is defined as a collection of locations. The collection of locations L may be represented by:






L={l
0
, l
1
, . . . , l
n}


where ∀0≦i≦n, li={s|s∈S}, i≠j, li∩lj=Ø.


After the clustering, a stay point in the user's location history may be substituted with the cluster ID. The individual user's location history may be represented as a sequence of the locations. Supposing s0 ∈ li, s1 ∈ lj, sn ∈ lk, where s=stay points, the equation for individual location history shown as (1) above, may be rewritten as:









h
=






l
i





Δ






t
1






l
i





Δ






t
2





,





,




Δ






t

n
-
1






l
k





.





(
2
)







The individual users' location histories may be compared and integrated to infer the correlation between locations.


The model identifies a trip as a sequence of locations consecutively visited by the individual user. The trip may be represented by:






Trip
=





l
0





Δ






t
1






l
1





Δ






t
2





,





,




Δ






t

k
-
1






l
k









where ∀0≦i≦k, Δtk<Tp (a threshold) and li∈L is a stay-point-cluster ID. In general, the individual user's location history may be regarded as a collection of trips, h={Trip}, and each Trip=(li→lj→ . . . ) is a sequence of locations represented by clusters of stay points.


The location correlation service 106 may use the trip data to further identify that a travel time spent between two consecutive stay points is to be used as a stay point or to be divided into two stay points. For example, if the location history of the individual user exceeds a predetermined threshold, in response, the location history of the individual user may be partitioned into more than one trip. On the other hand, if the travel time spent between two consecutive stay points does not exceed the predetermined threshold, the location history of the individual may be left as a single trip.


Next, the location correlation application 114 defines a collection of users. The collection of users U may be represented as: U={u0,1, . . . , um}. ∀0≦k≦m, uk ∈ U is an individual user having a trajectory Trajk, a location history hk and certain travel experience ek.


The location model identifies locations based on each individual user's location history data. These locations are saved in the database 116 for further processing or may be used by the location correlation service 106.



FIG. 4 illustrates an exemplary process 400 of extracting stay points from GPS logs 402. The data collected is a sequence of time-stamped points, shown as P={p1, p2, . . . pn}. Each point pi ∈ P contains the latitude (pi.Lat), the longitude (pi.Lngt), and the timestamp (pi.T).


Shown in the lower diagram 404, the process connects the GPS points, p1, p2, p3, . . . p9, according to their time series, into a GPS trajectory. As mentioned previously, the process extracts stay points based on the spatial and temporal values of the GPS points.


At 406, the stay point 1 is the geographical region where the individual user has remained stationary indoors at P3 for over a threshold time period. As mentioned, stay points are detected based on the time threshold within the distance threshold. For example, this type of stay point may occur when the individual user enters a building, causing the satellite signal to be lost. Once the individual returns outdoors, the satellite signal is detected again. Thus, stay point 1 is considered a geographical region (in this case, the location(s) where the signal was lost and regained) to be used in the location model.


At 408, the stay point 2 is the geographical region where the individual user may wander around within a spatial region for over a time period. The process constructs the stay point using the mean longitude and latitude of the GPS points within the region. Typical, stay points of this type occur when the individual wanders around outdoor places that detect the satellite signal, like a park, a campus, and the like.



FIG. 5 illustrates an exemplary process 500 of clustering stay points of the geographical regions of the individual user. All of the stay points, s1, s2, s3, . . . , s11, associated with the individual user are put into a dataset and clustered into clusters, c1, c2, c3, . . . c5, of several geographical regions. The clustering algorithm clusters the stay points by grouping and identifying similar places visited by the individual user.


Stay points are illustrated at S1, S2, . . . . S9 by 502. The stay point sequence S=(s1, s2, s3, . . . , sn) represents the location history of the individual user. Each stay point si corresponds to some geographical region and a common travel sequence to be followed by individual users. There would be corresponding times for each stay point, si·arvT and si·levT of arriving and leaving a place. The process applies density-based clustering by clustering the stay points into clusters of several geographical regions.


After the clustering of the stay points, the process transforms the individual stay point sequence into a location history sequence C={c1, c2, c3, . . . , cn}. The clusters are illustrated at C4 and C5 by 504. Each stay point is substituted by the cluster it pertains to, with arrival and departure times of the stay point retained and associated with the cluster. For example, stay points S1 and S2 may be substituted by C1, which is the cluster that S1 and S2 are currently located.


Infer Travel Experience


FIG. 6 is a flowchart showing an exemplary process 600 for inferring travel experiences of users from their location histories. The location correlation service 106 employs an inference model to infer individual user's travel experiences from their location histories 602.


The inference model regards the individual user's stay on a location as an implicitly directed link from the user to that location, i.e., an individual user would point to many locations and a location would be pointed to by many users. The user travel experience E and the location interest T have a mutual reinforcement relationship. The individual user with rich travel experiences in a region would visit many interesting places in that region, and a very interesting place in that region might be accessed by many individual users with rich travel experiences. More specifically, an individual user's travel experience may be represented by the sum of the interests of the locations accessed; the interest of a location may be calculated by integrating the experiences of the individual users visiting it. Using a power iteration method, each user's travel experience and each location's interest may be calculated. A diagram of the inference of travel experience and location interest is shown in FIG. 7.


Given a collection of individual users U's location histories H, the process 600 may build an adjacent matrix M between users and locations 604. In this matrix, an item rij stands for the times that ui has stayed in location lj, 0≦i<, 0≦j<|L|. For instance, the matrix may be represented as:
















1
0







1

1








1

2








1
3







1
4









M
=





U
0






U
1






U
2






U
3










1


1


0


0


0




1


1


2


0


0




0


0


1


0


2




0


0


0


1


1



























Then, the mutual reinforcement relationship of the individual user travel experience E=(e0, e1, . . . , em) and location interest T=(I0, I1, . . . , In) is represented 606 as follows:






e
ilj∈Lrij×Ij;






I
jui∈Urji×ei;


where ei stands for ui's travel experience and Ij denotes the location interest of lj. Writing the user travel experience and location interest in the matrix form 608, is shown by:






E=M·T,






T=M
T
·E.


The inference model uses Tn and En to denote location interests and travel experiences at the nth iteration. The iterative processes for generating the final results are:






T
n
=M
T
·M·T
n-1






E
n
=M·M
T
·E
n-1


Starting with T0=E0=(1, 1, . . . , 1), the process calculates the final results using the power iteration method 610. The algorithm may perform w rounds before being converged. The computing complexity of this method is (2w|L∥U|). The algorithm depicting the iterative process is shown.












InferUserExperience (U,L,H)















Input: A collection of users U, their location histories H, and


a collection of locations L detected from H.


Output: The collection of users' travel experiences E= (e0,1,...,em) .


1. τ0=E0=(1,1,...,1);


2. k=1;


3. Do


4. τk=MT·M·τk−1 ;


5. Ek=M·MT·EK−1;


6. τk= τk/ τk 1; //normalization


7. Ek= Ek/ Ek 1; //normalization


8. While Ek−Ek−1 1>εe or τk−τk−1 1>εl


9. Return Ek;









Using the power iteration method, it is possible to generate the final scores for each user travel experience and location, and rank the top n interesting locations and the top k experience users in a given region.



FIG. 7 illustrates an exemplary process 700 of the inference model. Shown are user travel experience and locations of interest along the left side. A location is a cluster of stay points, l0, l1, l2, . . . ln. The individual user's visit to the location is viewed as an implicitly directed link 702 that extends from the individual user u3 to the location l3. Shown at 704 is l0, which contains two stay points, one each from u0 trajectory 706 and from u1 trajectory 708. This illustrates that the users access many locations, and the location is visited by many users. This is an illustration of the mutual reinforcement relationship.


Correlating Locations


FIG. 8
a illustrates a flowchart showing an exemplary process 800 for correlating between the locations that have been identified through the location model. An algorithm computes the correlation between the locations by evaluating the individual user travel experience and the sequence of locations that have been visited 802.


The correlation between two locations depends on a number of users visiting the locations in a trip and rely on the individual users' travel experiences. The correlation between two locations that are continuously accessed by the individual user may be more correlated than those being visited discontinuously. The correlation between the two locations may be calculated by integrating the travel experiences of the users U′ who have visited the locations in a trip in a weighted manner 804.


To calculate a correlation between locations A and B, the location correlation service 106 may use the following equation 806:





Cor(A, B)=Σuk∈U′ α·ek,


where U′ is the collection of users who have visited locations A and B in a trip, ek is uk's travel experience, uk ∈ U′, and 0<α≦1 is a dumping factor, which may decrease as the interval between these two locations' index in a trip increases. For example, if setting α=2−(|ji|−1), where i and j are indices of locations A and B and in the trip that they are pertaining to; i.e., the more discontinuity there is for the two locations that are accessed by the user, (|i-j| may be large, thus a may become small, the less contribution the user may offer to the correlation between these two locations. The location correlation service 106 determines the correlation between the locations 808.



FIG. 8
b shows an exemplary process of correlating the locations. The diagram 810 shows an illustration of the location correlation process 800, which calculated the correlation between locations. The correlations between locations cover category similarity and geographical distance between locations based on human behavior. Thus, correlation may be discovered from location history. The correlation enables many valuable services, such as location recommendation system, sales promotion, bus routes design, mobile tour guides, and the like.


Correlations between the locations may identify locations that are similar in type, close proximity to each other, and/or correlated from a perspective of human behavior. Human behavior identifies location histories implying key factors, such as travel time, distance, accessibility, and sequence between the locations. If the individual user visited location A and then location B, it is presumed these two locations are within distance of each other. If there are additional data indicating individual users tend to follow the sequence from A to B, this may be implied as one-way road.


For example, the correlation process shows “Smithsonian” as being highly correlated to “Arlington National Cemetery” based on mining correlation data which shows individual users tend to visit both locations. Both of these locations have been clustered as tourist attractions, and as being near each other in the DC metro area. In another correlation, the “Potomac Overlook Regional Park” is also highly correlated to “Arlington National Cemetery”, based on analyzing individual user's location histories. Again, both of these are tourist attractions and located near each other. Thus, the “Smithsonian” and/or the “Potomac Overlook Regional Park” may be recommended to tourists whenever they travel to visit “Arlington National Cemetery”. If the user 108 activates a map or a website for “Arlington National Cemetery”, the location correlation service 106 may recommend “Potomac Overlook Regional Park”. Or as mentioned, if the user's current geo-spatial position is close to “Potomac Overlook Regional Park”, it may also appear as a recommended location. Otherwise, people might miss opportunities to visit sites that may be easily accessible and a place of interest with similar group identification.


In another implementation, the location correlation service 106 may find correlations among locations that are not similar in business categories but tend to follow a common travel sequence between locations. Restaurants are classified under the food category while museums and theatres are classified under entertainment. For example, the user 108 may be interested in going out to dinner at a restaurant first and then attending a show at the theatre. The location correlation service 106 recognizes the sequence and makes recommendations based on this. If the user 108 activates the map for directions for the restaurant or the theatre, the location correlation service 106 may recommend the other location. Thus, there are many advantages of correlating between locations, such as to gain knowledge from travel experiences of individuals with a higher knowledge of the region and to understand travel sequences between the locations.



FIG. 9 illustrates an exemplary process 900 for calculating the correlation between the locations. Shown are three users (u1, 2, u3) who access three locations (A, B, C). The three users may access the locations in different manners to illustrate three trips (Trip1, Trip2, Trip3). As shown in FIG. 9, the number shown below each node denotes the index of this node in the sequence. The sequence for u1 on Trip1 may be A, B, C at 902; the sequence for u2 on Trip2 may be A, C, B at 904; and the sequence for u3 on Trip3 may include B, A, C at 906.


Using the correlation equation shown below:





Cor(A, B)=Σuk∈U′ α·ek,


and with information from Trip1, the location correlation application 114 may calculate Cor(A,B)=e1 and Cor(B,C)=e1, since these locations have been consecutively accessed by u1 (i.e., α=1). However, Cor(A,)=½·e1 (i.e., α=2−(|2−0|−1)=½) as u1 traveled to B before visiting C. Thus, the correlation between locations A and C from Trip1 may not be as strong as the correlation between A and B, as they are not consecutively visited by u1. Thus, the learnings generated from the correlation of


Cor(A,)=e2, Cor(C,B)=e2, Cor(A,B)=½·e2 from Trip2, and inferring Cor(B,A)=e3, Cor(A,C)=e3, Cor(B,C)=½·e3 from Trip3. Later, the location correlation application 114 integrates these correlations that are inferred from each individual user's trips and obtain the following results:









Cor


(

A
,
B

)


=


e
1

+


1
2

·

e
2




;













Cor


(

A
,
C

)


=



1
2

·

e
1


+

e
2

+

e
3



;








Cor


(

B
,
C

)


=


e
1

+


1
2

·

e
3




;








Cor


(

C
,
B

)


=

e
2


;







Cor


(

B
,
A

)


=


e
3

.





Shown below is the location correlation algorithm for inferring correlation between locations. In the algorithm, b is a constant, which is set to 2. |Trip| stands for the number of locations contained in the Trip and Tri[i] represents the ith location in Trip. For example, regarding Trip1, |Trip|=3, Trip [0]=A (the first location), Trip [1]=B, Cor(Tri[0], Trip[1])=Cor(A,B).












CalculateLocationCorrelation (L,E,H,Tp)















Input: A collection of users' travel experiences E and their location


histories


H, location collection L, and a threshold Tp for trip partition.


Output: A matrix Cor describing the correlation between locations.


 1. Foreach location lp∈L Do


 2. Foreach location lp∈L,p≠q Do









 3.
Cor lp,lq =0;
//initialize the location correlation








 4. Foreach hk∈H Do
//each user's location history









 5.
TP=TripPartition(hk,Tp);
//partition uk's location history









into trips








 6.
Foreach Trip in TP Do








 7.
For i=0;i< Trip ;i++ // ith location contained in Trip


 8.
 For j=i+1;j< Trip ;j++








 9.
α=b−(j−i−1) ; // dumping factor, b is a constant








10.
Cor Trip[i],Trip[j] +=α·ek ;







11. Foreach lp∈L Do








12.
Foreach lq∈L,p≠q Do //normalization








13.
 Cor (lp,lq) =Cor (lp,lq) /|| Cor (lp,l0) ,...,Cor



 (lp,l|L|−1)|| 1







11. Return Cor;









In an implementation, there may be n trips in a dataset and the average length of a trip is m. The mining algorithm takes











O


(


2




L


2


+



m


(

m
-
1

)



?


·
n


)









?



indicates text missing or illegible when filed





time. So, the overall computing complexity F of this approach is the combination of inferring user travel experience and calculating the location correlation, i.e.,











Q
=


O


(


2

w



L





U



+

2




L


2


+



m


(

m
-
1

)


2

·
n


)


.






The correlation results may be mined. Shown below is an algorithm for mining the correlation As mentioned previously, lines 2-4 illustrate detecting stay points and formulating location histories into a sequence of stay points. Lines 5 and 6 illustrate the clustering of all of the users' stay points. Lines 7 and 8 illustrate representing the location history by a sequence of stay point clusters called locations. Lines 9 and 10 show the iterative model being used to learn each user's travel experience. And Line 11 illustrate the algorithm is used to calculate the correlation.












MiningLocationCorrelation (U, TRAJ, Tr, Dr,Tp)















Input: A collection of users U and their trajectories TRAJ= Trajk ,


a time threshold Tr and a distance threshold Dr for stay point detection,


and a Tp for trip partition.


Output: A matrix Cor of correlation between each pair of locations.


 1. S=φ; H= φ; //temporal variables


 2. Foreach uk∈U do








 3.
 ST=StayPointDetection(Trajik, Tr, Dr); //


 4.
hk = LocHistPresent(ST); //a sequence of stay points


 5.
S=S ∪ST; // a collection of all users' stay points








 6. L = Clustering(S);
//detect locations by clustering the stay points







 7. Foreach uk∈U do








 8.
hk = LocHistRepresent( custom-character  k,L); //a sequence of locations


 9.
H= H ∪hk, //a collection of all users' location histories







10. E=InferUserExperience(U,L,H);


11. Cor=CalculateLocationCorrelation(L,E,H,Tp);


12. Return Cor.









Once the results have been mined, they may be stored in the database 116. As discussed above, certain acts in processes 300, 600, and 800 need not be performed in the order described, may be modified and/or may be omitted entirely, depending on the circumstances.


Exemplary Server Implementation


FIG. 10 is a block diagram showing an exemplary server usable with the environment of FIG. 1. The server 112 may be configured as any suitable system capable of services, which includes, but is not limited to, implementing the location correlation service 106 for online services, such as providing recommendations. In one exemplary configuration, the server 112 comprises at least one processor 1000, a memory 1002, and a communication connection(s) 1004. The communication connection(s) 1004 may include access to a wide area network (WAN) module, a local area network module (e.g., WiFi), a personal area network module (e.g., Bluetooth), and/or any other suitable communication modules to allow the server 112 to communicate over the network(s) 104.


Turning to the contents of the memory 1002 in more detail, the memory 1002 may store an operating system 1006, the module for the location correlation service 106(a), the module for the location correlation application 114(a), and one or more applications 1008 for implementing all or a part of applications and/or services using the location correlation service 106.


The one or more other applications 1008 or modules may include an email application, online services, a calendar application, a navigation module, a game, and the like. The memory 1002 in this implementation may also include a location model module 1010, an inference model module 1012, and a location correlation algorithm or module 1014.


The location model module 1010 transforms and processes the data to create the location model. The process includes collecting GPS logs, parsing trajectories from the log data, extracting stay points from the trajectories, clustering stay points of individual users and of multiple users, and identifying locations.


The memory 1002 in this implementation may also include the inference model module 1012. The module 1012 integrates the travel experience of the user with locations of interest in locations visited by the individual user. The module 1012 builds a matrix and performs an iterative process for generating results based on inferring users travel experiences and locations.


The location correlation algorithm or module 1014 determines a correlation between the locations that have been identified by the location model. The location correlation module 1014 performs calculations by evaluating users' travel experiences, their location histories, location collection, and a threshold for trip partition. Based on this input, the module 1014 determines a correlation between the locations.


The server 112 may include a content storage 1016 to store the collection of GPS logs, trajectories, stay points, clusters, location model, correlation results, and the like. Alternatively, this information may be stored on database 116.


The server 112 may also include additional removable storage 1018 and/or non-removable storage 1020. Any memory described herein may include volatile memory (such as RAM), nonvolatile memory, removable memory, and/or non-removable memory, implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, applications, program modules, emails, and/or other content. Also, any of the processors described herein may include onboard memory in addition to or instead of the memory shown in the figures. The memory may include storage media such as, but not limited to, random access memory (RAM), read only memory (ROM), flash memory, optical storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the respective systems and devices.


The server as described above may be implemented in various types of systems or networks. For example, the server may be a part of, including but is not limited to, a client-server system, a peer-to-peer computer network, a distributed network, an enterprise architecture, a local area network, a wide area network, a virtual private network, a storage area network, and the like.


Various instructions, methods, techniques, applications, and modules described herein may be implemented as computer-executable instructions that are executable by one or more computers, servers, or telecommunication devices. Generally, program modules include routines, programs, objects, components, data structures, etc. for performing particular tasks or implementing particular abstract data types. These program modules and the like may be executed as native code or may be downloaded and executed, such as in a virtual machine or other just-in-time compilation execution environment. The functionality of the program modules may be combined or distributed as desired in various implementations. An implementation of these modules and techniques may be stored on or transmitted across some form of computer-readable media.


Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.

Claims
  • 1. A method implemented at least partially by a processor, the method comprising: collecting global positioning system (GPS) logs of geospatial locations of multiple users captured over time;constructing a location model for each individual user of the multiple users by: identifying trajectories representing trips of the individual user based on the GPS logs of geospatial locations captured over time;extracting stay points from the trajectories, each stay point representing a geographical region where the individual user stayed over a time threshold within a distance threshold; andformulating a location history for the individual user based on a sequence of the extracted stay points.
  • 2. The method of claim 1, wherein the GPS logs include a sequence of GPS points representing geospatial locations of the individual user captured over a time period, and wherein the GPS points each contain a date, a time, a longitude, and a latitude.
  • 3. The method of claim 1, further comprising: clustering the stay points of geographical regions for the individual user to form clusters of stay points; andremoving a top two clusters of stay points having a greatest number of stay points to eliminate the geographical regions that are private to the individual user.
  • 4. The method of claim 1, further comprising: clustering the stay points that are extracted into clusters corresponding to the geographical regions based on a density-based clustering algorithm; anddetecting clusters with irregular structures.
  • 5. The method of claim 1, further comprising clustering the stay points that are extracted to form clusters by using a density-based clustering algorithm, the algorithm based at least in part on a core-distance parameter and a minimum number of stay points falling within the core-distance.
  • 6. The method of claim 1, further comprising: creating a dataset of location histories of the multiple users;partitioning the dataset of the multiple users into clusters by employing a density-based clustering algorithm;assigning the stay points in the dataset into clusters of geographical regions that are similar;substituting a stay point in the location history of the individual user with an identification of a cluster; andidentifying locations of geographical regions based on the clustering of the stay points.
  • 7. The method of claim 1, further comprising: identifying that a travel time spent between two consecutive stay points in the location history of the individual user exceeds a predetermined threshold and, in response, partitioning the location history of the individual user into more than one trip; oridentifying that the travel time spent between the two consecutive stay points does not exceed the predetermined threshold and, in response, leaving the location history of the individual as a trip.
  • 8. The method of claim 1, further comprising computing a correlation between locations in location histories of the multiple users by: identifying a number of individual users visiting the locations;integrating travel experiences of the individual users who have visited the locations in a weighted manner;determining a common travel sequence which multiple users followed between two stay points; andidentifying a correlation between the locations based on the travel experiences and the common travel sequence.
  • 9. The method of claim 8, further comprising presenting a user with a recommendation, based at least in part on the correlation between locations, the recommendation being based on a user's present geospatial location, a prediction of the user's interest in a location, locations within a threshold travel time, and/or locations within a predetermined distance from the user's present geospatial location.
  • 10. One or more computer-readable media encoded with instructions that, when executed by a processor, perform acts comprising: accessing a location model constructed from global positioning system (GPS) logs of geographical locations to identify locations for calculating a correlation between identified locations;calculating a correlation between the identified locations from the location model based on using an algorithm for: identifying a number of individual users visiting the identified locations in a trip;integrating the travel experiences of the number of individual users who have visited the identified locations in a weighted manner;determining a common travel sequence based on the number of individual users followed between stay points, each stay point representing a geographical region where an individual user stayed over a time threshold within a distance threshold;identifying a recommended location based on the correlation between the identified locations from location histories of the individual users;detecting a user's present geospatial location or accessing a geospatial location on a map; andrecommending the recommended location based on detecting the user's present geospatial location or based on the geospatial location accessed on the map, wherein the recommended location is within a threshold travel time and/or locations within a predetermined distance from the geospatial location.
  • 11. The computer-readable media of claim 10, wherein the integrating the travel experiences comprises employing an inference model to infer the travel experiences by: building a matrix between individual users and locations visited by the individual users;representing a relationship between the travel experiences of the individual user and location interests of the locations visited; andcalculating the travel experiences and the location interests in an iterative process to determine the travel experiences.
  • 12. The computer-readable media of claim 10, further comprising building a location model for each individual user of the multiple users to identify locations to determine the correlation, the building by: retrieving global positioning system (GPS) logs of geospatial locations of multiple users captured over time;constructing a location model for each individual user of the multiple users by: identifying trajectories representing trips of the individual user based on the GPS logs of geospatial locations captured over time;extracting stay points from the trajectories, each stay point representing a geographical region where the individual user stayed over a time threshold within a distance threshold; andformulating a location history for the individual user based on a sequence of the extracted stay points.
  • 13. The computer-readable media of claim 11, further comprising clustering the stay points of geographical regions for the individual user to form clusters of stay points;removing a top two clusters of stay points having a greatest number of stay points to eliminate the geographical regions that are private to the individual user; andreclustering the stay points after the top two clusters have been removed.
  • 14. The computer-readable media of claim 10, further comprising: creating a dataset of location histories of the multiple users;partitioning the dataset of the multiple users into clusters by employing a density-based clustering algorithm;assigning the stay points in the dataset into clusters of geographical regions that are closely related in distance;substituting a stay point in the location history of the individual user with an identification of a cluster; andidentifying the identified locations of geographical regions based on the clustering of the stay points.
  • 15. The computer-readable media of claim 10, wherein the GPS logs include a sequence of GPS points representing geospatial locations of the individual user captured over a time period, and wherein the GPS points each contain a date, a time, a longitude, and a latitude.
  • 16. The computer-readable media of claim 10, further comprising: identifying that a travel time spent between two consecutive stay points in the location history of the individual user exceeds a predetermined threshold and, in response, partitioning the location history of the individual user into more than one trip; oridentifying that the travel time spent between the two consecutive stay points does not exceed the predetermined threshold and, in response, leaving the location history of the individual as a trip.
  • 17. A system comprising: a memory;a processor coupled to the memory:a location model module stored in the memory and executable on the processor to construct a location model for identifying locations, the locations based on location histories of multiple users captured over time through global positioning system (GPS) logs; anda location correlation module stored in the memory and executable on the processor to compute a correlation between the locations of the multiple users, by integrating travel experiences of individual users and determining a common travel sequence which the individual users followed between locations.
  • 18. The system of claim 17, further comprising: a location correlation application module stored in the memory and executable on the processor to provide a recommendation, based at least in part on the correlation between the locations of the multiple users, the recommendation being based on a user's present geospatial location, a prediction of the user's interest in a location, locations within a threshold travel time, and/or locations within a predetermined distance of the user's present geospatial location.
  • 19. The system of claim 17, further comprising: an inference model module to infer the travel experiences of the individual user by: building a matrix between individual users and locations visited by the individual users;representing a relationship between the travel experiences of the individual user and location interests of the locations visited; andcalculating the travel experiences and the location interests for each location in an iterative process to determine the travel experiences.
  • 20. The system of claim 17, further comprising the location model module stored in the memory and executable on the processor to construct the location model by: extracting stay points from the GPS logs, each stay point representing a geographical region where the individual user stayed over a time threshold within a distance threshold;partitioning a dataset of the multiple users into clusters by employing a density-based clustering algorithm;assigning the stay points in the dataset into clusters of geographical regions;substituting a stay point in the location history of the individual user with an identification of a cluster; andidentifying locations of geographical regions based on the clustering of the stay points.