Global Navigation Satellite Systems (GNSS) are becoming increasingly ubiquitous in both military and civilian applications for tracking the movement of people and goods. However, as millions of GNSS receivers are being monitored at frequencies up to one per second, location-aware information systems struggle to quickly process overwhelming amounts of location data and transform this information in actionable intelligence.
GPS enabled mobile phones supply data which could be used for analyzing traffic patterns and also provide a means of receiving notifications. Market research expects GPS device shipments alone to have a compound annual growth of more than 25% through 2013. Additionally, global penetration of GNSS in mobile phones is expected to surpass 50% by 2015. Given that there were an estimated 5.3 billion moile phone subscribers at the end of 2010, the number of GNSS-enabled mobile phones emerging over the next few years will be staggering.
Accordingly, what is needed in the art is a system and method designed to rapidly analyze raw GNSS position tracking data which maintains the spatial and temporal properties of the data associated with the movement of the user from one point-of-interest to another.
The present invention provides an unsupervised method for fast GNSS cluttering of data which quickly translates a large collection of GNSS position data into a series of Points-of-Interest (POIs), which define spatial dimensions where a user has stopped for a significant amount of time, and trips, which define the spatial and temporal properties for movement from one POI to another. The method of the present invention uses a balanced binary tree to represent a cluster and exploits the properties of binary trees to perform merges between two clusters in logarithmic running time, and maintain a O(n) memory storage requirement during execution. The fast GNSS clustering method of the present invention is also capable of merging disjointed ambiguously-related trees when no exact relationship exists. The method of the present invention avoids the scalability pitfalls of hierarchical clustering algorithms and is specifically designed to handle moderately large tracking databases, where a single days worth of data for one user can total more than 1000 points after pre-processing.
The present invention provides a method of generating a travel history for a user from a set of global navigation satellite system (GNSS) data for the use. The method may include, acquiring a set of time-stamped GNSS data recorded by a user's mobile device, the time-stamped GNSS data comprising spatial and temporal information, defining a plurality of temporarily ordered points-of-interest (POI) for the user based upon the acquired set of time-stamped GNSS data, wherein each of the plurality of POIs defines a spatial dimension where the user has stopped for a significant amount of time and identifying a plurality of trips taken by the user between the plurality of defined POIs to generate a travel history for the user, wherein each of the plurality of trips originates and terminates at one of the plurality of POIs and wherein each of the plurality of trips defines a spatial and temporal property for movement between two of the plurality of POIs. The merging of the POIs is performed in logarithmic running time, while maintaining an O(n) memory storage requirement during execution.
An embodiment of the present invention may include a non-transitory computer readable storage medium having a method encoded thereon for performing the inventive method.
Another embodiment of the present invention may include, a computer system comprising a central processing unit for generating a travel history for a user from a set of global navigation satellite system (GNSS) data for the user by acquiring a set of time-stamped GNSS data recorded by a user's mobile device, the time-stamped GNSS data comprising spatial and temporal information, defining a plurality of temporarily ordered points-of-interest (POI) for the user based upon the acquired set of time-stamped GNSS data, wherein each of the plurality of POIs defines a spatial dimension where the user has stopped for a significant amount of time and identifying a plurality of trips taken by the user between the plurality of defined POIs to generate a travel history for the user, wherein each of the plurality of trips originates and terminates at one of the plurality of POIs and wherein each of the plurality of trips defines a spatial and temporal property for movement between two of the plurality of POIs and a memory unit coupled to the central processing unit, the memory unit having a an O(n) memory storage requirement.
The present invention uses a balanced binary tree to represent a cluster (POI) and exploits the properties of binary trees to perform merges between two clusters in logarithmic running time, while maintaining an O(n) memory storage requirement during execution.
The present invention now will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific embodiments by which the invention may be practiced. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Among other things, the present invention may be embodied as methods or devices. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. The software by also be part of computer system and may be stored on a non-transitory computer readable medium prior to execution. The following detailed description is, therefore, not to be taken in a limiting sense.
The present invention is a system and method that is able to automatically generate spatial points-of-interest and trip information from raw location data, such positions calculated using Global Positioning Systems (GPS), that are recorded by a mobile positioning device, such as a GPS-enabled cell phone.
The method of the present invention identifies POI (points of interest) where a user may have lingered in their trip. The input of the clustering method of the present invention is a session which consists of a set of time-stamped GPS points recorded by a user's mobile device (i.e. mobile phone) over a duration of time. As such, POIs detected in a single session might include the user's home, place of work, or recreational area. The remaining unclustered points in the session dataset exist between POIs and can be considered a trip (i.e. a segment which joins an origin POI and a destination POI). Thus, as the user's travel history is being recreated from the acquired dataset, it is necessary to know at what time a user arrived at and departed from a POI. If one cluster ID was assigned to all points within a cluster then this could easily be done by iterating through the original set of points and finding the first and last points within a give cluster ID provided the points in the original set were stored in temporal order.
The temporarily-ordered balanced binary tree resulting from the present invention has added benefit that the maximum and minimum times can be retrieved in O (log n), which is useful for database insertions which have to be executed in order to assign auto generated keys to some element(s) in the cluster. The present design exploits the logarithmic structure of a binary tree to perform various operations which are used to merge two AVL trees. Each element (point) in the dataset is indexed so that the algorithm can compute d (i, j), the distance between νi and νj.
The following notation will be used in the remainder of the specification:
Let {νk}, k=1, . . . , |Ti|, νk<νk+1, be the monotonically increasing sequence over all nodes in T. Let {qk}, k=1, . . . , |Tj|, qk<qk+1, be the monotonically increasing sequence over all nodes in Tj. Let {qnk} be the longest sub-sequence of {qk}, where {nk} is a monotonically increasing sequence over the index set {k=1, . . . , |Tj|}, such that qnk>M and qnk<N for M, N ε{νk}. {qnk} is the longest possible subsequence of {νk} bounded by the times of two points in m, N ε{νk}. Then the criteria to use the union function in is not satisfied. In other words, there is a partial (or complete) overlap between the range of Tj and the range of Ti, as shown with reference to
In the following method steps described, it is assumed that the GNSS data being processed is from a single user and that all timestamps for individual location data points are unique.
With reference to
The partition algorithm of the present invention is similar to a recursive binary search where a tree is recursively searched for a given key. The partition algorithm is given a key κ which it attempts to find in {circumflex over (T)}j. This key κ will be a timestamp that is unique to {circumflex over (T)}i, either N or M. N or M would be chosen according to
Regardless of how far the algorithm has to search for κ (the left or right vertexes along the based of triangle 30 in
here n=|Tj|. Solving this recurrence relation using the Akra-Bazzi theorem gives the solution θ(log2(n)). The Akra-Bazzi theorem is a generalization of the master method which solves recurrences with more general functions for ƒ(n) and subproblems of unequal size.
The final procedure is executed when Tj is entirely bounded by two points (timestamp data) in Ti, when {qnk}={qk} so that M≦min(Ti)<qno=min(Tj)< . . . <qnK=max(Tj)<N≦max(Ti), as shown with reference to
The merge procedure tries to find a node νε{circumflex over (T)}i such that νε(qn0, qnK) If ν is a terminal node, remove ν from {circumflex over (T)}i (by removing the reference to its parent) and insert it into {circumflex over (T)}j Then, for each return back to the root we union Tj with the subtree rooted at the child of ν opposite the search direction. Finally, insert ν into the new {circumflex over (T)}k, as shown with reference to
If ν is not a terminal node insert {circumflex over (T)}j→root into Tν, break {circumflex over (T)}j into its left and right subtrees and call the algorithm recursively on both subtrees. Note that by calling the algorithm recursively using Tν→root we avoid having to traverse {circumflex over (T)}ν again.
In an analysis of the time complexity, let
and
operates on a portion of Ti in each recursion. For instance if νε(min ({circumflex over (T)}k), max ({circumflex over (T)}k)) but ν is not terminal then the algorithm will operate on Tν in the next recusion until ν is some terminal node which satisfies the condition νε(min ({circumflex over (T)}k), max ({circumflex over (T)}k) or Tk is exhausted through deletion. The length of the path to a satisfactory ν is O (log n). For the best and worse case running times we assume that log n≦m.
The best case scenario is that the first ν such that νε(min ({circumflex over (T)}k), max ({circumflex over (T)}k)) is a terminal node. {circumflex over (T)}k is then merged with the subtree rooted at the child in the opposite search direction. As we attach {circumflex over (T)}k at some terminal node in {circumflex over (T)}i, n will only increase returning to the root. Recall that n will increase because {circumflex over (T)}k will be merged with Tν→opp, where opp is the direction opposite the search direction. The recursive formula for the best case running time is T(n)=T(n/2)+θ(log(n+m)). It takes O(log n) to find ν, so solving this recurrence gives
Thus, the best case running time is O(log(n))*log(n+m)).
In the worst case ∀νε{circumflex over (T)}i, ␣(min({circumflex over (T)}k), max ({circumflex over (T)}k)), then in each recursion, the root of Tk is inserted in Tν, and the algorithm is called recursively on Tν→left and {circumflex over (T)}ν→right. As the size of {circumflex over (T)}i will grow by 1 m times, the running time is log((n+1)(n+2) . . . (n+m))<m log2n εO(m log n). This cost also includes the time 2 log n+i, i=0 . . . m−1, to find ν and insert Tj→root for each recursion, and the time O(log n+i) to union the left and right subtrees of ν.
If the assumption were not made that log n≦m then the worst case running time would simple be given by O(max (m, log n)*log(n)).
Let n=|Ti| and m=|Tj|. Finding the final set of clusters requires iterating through
pairs, computing the distance between them, and merging {circumflex over (T)}i and {circumflex over (T)}j if they are disjoint. Because νk=i,j will not necessarily be the root of {circumflex over (T)}k the algorithm always travels up the tree (until the parent pointer is null) before it merges two trees. For instance, if a node is added to a tree it might end up as a terminal node. A exemplary embodiment of the cluster algorithm of the present invention is illustrated with reference to
In a time complexity analysis, because only disjoint clusters are merged, there could be at most N mergers. As max (m, log n)*log(n)<N log N for all (Ti, Tj), the worst case time spent to find the final set of clusters is O(N2 log N).
In an experimentally comparison, the method of the present invention was compared to another method currently known in the art, DBScan. DBScan was selected because of DBScan's fast running time. DBScan is widely used in clustering applications for its efficiency and noise reduction filtering. Both algorithms were executed on an 2.0 GHz AMD Athlon™ 64 X2 Dual Core Processor 3800+ with 2.00 GB of RAM. We modified the DBScan in to use great circle distance rather than euclidean distance for our testing purposes. Both algorithms received the same sequence of session data as input. The running time of the DBScan implementation is O(n2). The table shown in
The results of clustering some of the largest datasets consisting of unique (times stamped) points are presented in Table II of
The results show that the method of the present invention (referred to as Fast GNSS Clustering) is faster than DBScan. For Fast GNSS, in accordance with the method of the present invention, ordering the GPS points is crucial because it reduces the number of bounded merges that need to be performed to merge Ti and Tj. By ordering the points the algorithm is more likely to build a cluster in temporal order so that when two trees are to be merged they benefit from being disjoint. The results demonstrate that the worst-case running time, O(n2 log n), we estimate for Fast GNSS Data clustering is an overestimation for inputs of our size because DBScan has a running time of O(n2) but our algorithm outperformed DBScan, sometimes by several seconds on all inputs. For both algorithms, the clusters that were found were identical as expected. DBScan's noise detection capability may or may not be beneficial depending on the duration of the time spent at the POI. For instance, a two point cluster could very well have a duration of time that is significant enough to be considered a legitimate POI if GPS signal coverage is weak in that location.
One consequence of the FAST GNSS Clustering of the present invention lacking noise reduction has been the identification of smaller clusters, or pseudo-POIs, that dont actually represent a location where a person performed an activity. Instead, pseudo-POIs typically identify a location where a person has briefly hesitated during travel, often as a result of a traffic delay when traveling on roads. For pedestrian data, pseudo-POIs often occur as users are waiting for pedestrian crosswalk signals at traffic lights. While pseudo-POIs are actually undesirable in terms of creating POIs and trips, this data could provide insight into traffic delays to aid in traffic signal retiming, road construction and enhancement, and even locations where advertising is most likely to be visible to a traveler. The duration of exposure to advertising could even be measured, since the direction the user is facing during travel is also known from the GNSS data.
The primary reason for Fast GNSS in accordance with the present invention is to build a travel record for the user. Two operations may be executed on the user's point set, (1) inserting clusters into a database and (2) extracting trips from the database.
Finding the clusters from the user's points is required to construct and insert a record in a database that represents the cluster. This record would include information such as a unique clusterID, the arrival and departure times as well as a spatial polygon which is built by creating a string of points representing the (lat, long) pairs and inserting it into a spatial database. We can iterate through the cluster in O(n) using a level order traversal to construct the geospatial string. A level order traversal does not use the temporally ordered property of the cluster. The temporal ordering is used when we need to retrieve the starting and ending times which is done in O(log n). With an unordered clustering approach, for example DBScan, this would take O(n log n), where n is the size of the cluster, using an optimal sorting algorithm such as mergesort to temporally sort the unordered cluster. The advantage of Fast GNSS is that the temporal ordering is persisted throughout the execution of the algorithm which eliminates the need to do any post processing such as executing sorting algorithms on the extracted clusters. Of course, instead of mergesort the first element within a cluster could be found by iterating through the set of input points until the first point with the clusterID is reached but this would require O(n) which is still slower than O(log n).
Trips for the user are simply the unclustered portion between two clusters. An exemplary algorithm to find these segments is given in the algorithm shown in
The method of the present invention is able too rapidly analyze raw GNSS position data and identify Points-of-Interest (POIs) (i.e., clusters), or locations where a tracked user or object pauses for a significant amount of time and segment travel behavior into user trips from one POI to another. This method of the present invention uses AVL trees to merge clusters in logarithmic running time, and maintain an O(n) memory storage requirement during execution. The method of the present invention also maintains GNSS data ordered by the time of the position fix within a cluster to aid in rapid extraction of travel information such as arrivals and departures from POIs.
The main benefit of Fast GNSS is its space saving property but from the clusters it generates the maximum and minimum elements can be retrieved in O(log n) time which could be useful for creating a user's travel history. Fast GNSS Clustering is also capable of merging disjoint ambiguously-related trees when no exact relationship exists. An ambiguous relationship would occur when (part of) the range of time of one cluster/tree overlaps the range of time of the other cluster. Two clusters would be merged when they are disjoint, and the distance between two points in a pair from the Cartesian product of their points is less than a distance c. As a result of using AVL trees, the relationships among the points within a cluster are an implicit property of the cluster itself rather than the responsibility of the proximity matrix that is commonly used in hierarchical clustering.
It will be seen that the advantages set forth above, and those made apparent from the foregoing description, are efficiently attained and since certain changes may be made in the above construction without departing from the scope of the invention, it is intended that all matters contained in the foregoing description or shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
It is also to be understood that the following claims are intended to cover all of the generic and specific features of the invention herein described, and all statements of the scope of the invention which, as a matter of language, might be said to fall therebetween. Now that the invention has been described,
This application claims priority to currently U.S. Provisional Patent Application No. 61/502,061, filed on Jun. 28, 2011, entitled “System and Method for Spatial Point-of-Interest Generation and Automated Trip Segmentation Using Location Data”.
This invention was made with Government support under FDOT BDK85 TWO 977-14 awarded by the Federal Department of Transportation. The Government has certain rights in the invention.
Number | Name | Date | Kind |
---|---|---|---|
20030100993 | Kirshenbaum et al. | May 2003 | A1 |
20040181495 | Grush | Sep 2004 | A1 |
20050065711 | Dahlgren et al. | Mar 2005 | A1 |
20090036145 | Rosenblum | Feb 2009 | A1 |
20140135040 | Edge et al. | May 2014 | A1 |
Number | Date | Country | |
---|---|---|---|
61502061 | Jun 2011 | US |