The present invention relates to spatio-temporal prediction, and more specifically, to characterizing relationships among space-time events.
Spatio-temporal data refers to data that provides information about both location and time. Current technology has increased the availability of spatio-temporal data. For example, global positioning system (GPS) receivers provide location information associated with time. Consequently, the use of data analytics on spatio-temporal data and applications for the analytics are also increasing. One such application is spatio-temporal prediction or the prediction of a location and time range for an event. Exemplary spatio-temporal predictions pertain to the likelihood of crime, traffic congestion, and epidemic spread characterization.
According to one embodiment of the present invention, a method of characterizing relationships among spatio-temporal events includes receiving information specifying the spatio-temporal events and associated categories from one or more sources; and building, using a processor, a directed acyclic graph (DAG) indicating a relationship among the categories for each of two or more space lag (SL) and time lag (TL) sets, each of the two or more SL and TL sets defining a spatio-temporal boundary such that only the spatio-temporal events and the associated categories with (SL,TL)-neighborhoods inside the respective spatio-temporal boundary are considered in building the respective DAG, the respective (SL,TL)-neighborhood of each of the spatio-temporal events being a polygonal shape defined by the respective SL and the respective TL and the respective (SL,TL)-neighborhood of each of the categories being a union of the (SL,TL)-neighborhoods of the associated spatio-temporal events.
According to another embodiment, a system to characterize relationships among spatio-temporal events includes an input interface configured to receive information specifying the spatio-temporal events and associated categories from one or more sources; and a processor configured to build a directed acyclic graph (DAG) indicating a relationship among the categories for each of two or more space lag (SL) and time lag (TL) sets, each of the two or more SL and TL sets defining a spatio-temporal boundary such that only the spatio-temporal events and the associated categories with (SL,TL)-neighborhoods inside the respective spatio-temporal boundary are considered in building the respective DAG, the respective (SL,TL)-neighborhood of each of the spatio-temporal events being a polygonal shape defined by the respective SL and the respective TL and the respective (SL,TL)-neighborhood of each of the categories being a union of the (SL,TL)-neighborhoods of the associated spatio-temporal events.
According to yet another embodiment, a computer program product comprises instructions that, when processed by a processor, cause the processor to implement a method of characterizing relationships among spatio-temporal events. The method includes obtaining, from one or more sources, information specifying the spatio-temporal events and associated categories; and building a directed acyclic graph (DAG) indicating a relationship among the categories for each of two or more space lag (SL) and time lag (TL) sets, each of the two or more SL and TL sets defining a spatio-temporal boundary such that only the spatio-temporal events and the associated categories with (SL,TL)-neighborhoods inside the respective spatio-temporal boundary are considered in building the respective DAG, the respective (SL,TL)-neighborhood of each of the spatio-temporal events being a polygonal shape defined by the respective SL and the respective TL and the respective (SL,TL)-neighborhood of each of the categories being a union of the (SL,TL)-neighborhoods of the associated spatio-temporal events.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with the advantages and the features, refer to the description and to the drawings.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The forgoing and other features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
As noted above, spatio-temporal data is used in spatio-temporal prediction applications. Many spatio-temporal events are related to other events. For example, the closing time and location of a bar may be related to certain crimes in the vicinity of the location. Therefore, the prediction of one type (category) of event may be improved by understanding the relationships among different categories of events. Embodiments of the system and method detailed herein relate to characterizing relationships among spatio-temporal events and, more specifically, among categories of events.
At block 315, graph enumeration begins the process of building the DAG 100 with an empty set (no edges 120). At each iteration, an edge 120 is added. Then the process of graph pruning, at block 317, is implemented to determine if the new edge 120 should be retained or removed. The pruning process requires the processes of the statistical significance estimation portion 320 which, in turn, calls processes of the null model construction portion 330. The graph pruning at block 317 removes a statistically insignificant edge 120, as detailed below. When N is the number of event categories 110 available, the maximum possible number of edges 120 for a resulting DAG 100 is:
The development of a DAG 100 (statistical significance check of each edge 120) is specific for a given space lag SL and time lag TL (SL,TL), as further discussed below. That is, two categories 110 that are related within one (SL,TL) range may not be related within a narrower (SL,TL) range. For example, SL may vary from 0 meters to 1 kilometer in increments of 50 meters, and TL may vary from 0 to 48 hours in increments of 2 hours. The number of (SL,TL) combinations considered and the SL and TL ranges themselves may be based on the application (type of event being predicted), a user input, or a combination. Thus, a given set of categories 110 may result in multiple different DAGs 100 for multiple different (SL,TL) combinations. The processes of the method shown in
As indicated above, at each iteration, a candidate edge 120 is added to the DAG 100 (D) to generate one or more candidate DAGs 100 (D*). The statistical significance of the candidate edge 120 is determined to determine whether the candidate edge 120 is pruned or retained. Specifically, as detailed below, a number of support events associated with the candidate edge 120 is determined and an expected number of support events based on a null hypothesis (a hypothesis of no relation between the categories 110 connected by the candidate edge 120) is determined, and the statistical significance of the candidate edge 120 is expressed as a probability (P-value), for example, based on the number of support events and the expected number of support events. When this statistical significance exceeds a threshold statistical significance (340), the candidate edge 120 is retained.
For a given edge 120 (e.g., A→B), the set of events belonging to category 110 A are referred to as predecessor events, and the set of events belonging to category 110 B are referred to as successor events. For each SL and TL, a number of support events is counted at block 325 (
The expected number of support events is computed under a null hypothesis of no relationships. That is, for example, for an edge 120 under consideration to determine if event category 110 A and event category 110 B are related (A→B), the expected number of events is the number of events in category 110 B in the (SL,TL)-neighborhood of category 110 A when there is no relationship between category 110 A and category 110 B. The density estimation (335,
Then for each SL and TL, the (SL,TL)-neighborhood of predecessor event category 110 A (according to the exemplary A→B being considered) is computed for each sub-region sr as a volume VolA(sr,TL,SL). This computation is further detailed below with reference to
ΣsrλB(sr)VolA(sr,TL,SL) [EQ. 3]
This expected number is returned to be used in the computation of the P-value (327,
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one more other features, integers, steps, operations, element components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
The flow diagrams depicted herein are just one example. There may be many variations to this diagram or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
While the preferred embodiment to the invention had been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.