This application is related to the following commonly assigned U.S. patent applications, which are incorporated herein by reference: U.S. patent application Ser. No. 12/823,132, “Strong Typing for Querying Information Graphs”, filed Jun. 25, 2010; and U.S. patent application Ser. No. 12/705,983, “Expressing and Executing Semantic Queries Within a Relational Database”, filed Feb. 16, 2010.
As discussed in the above-referenced patents, it is known to store and query data that is modeled as a graph. Graphs can be networks of interrelated nodes, where each node is either a subject or an object and each edge is a named value, also called a predicate. Graphs can be represented by a collection of records, where members of records can point to other records. In this way, a triple is formed as [Record, MemberName, Record pointed to by value of MemberName]. An example of such a triple or fact might be “[a particular vehicle model] [has a model name] [‘A4’]”. A semantic store might store many facts of this form, as well as semantically related facts.
Graph-based data has been stored with special graph databases (e.g., Neo4j by Neo Technologies). As shown in the above-related patents, graph-based data can also be stored using a relational database as the underlying storage layer, with an application layer to provide access to the data using graph semantics. In either case, as information changes in a graph-based or semantic data store, information is often erased or otherwise destroyed; facts are sometimes physically deleted. Queries against this body of information are consequently invalidated or incomplete. In other words, results obtained from a query executed at one point in time might not be identically reproduced when the query is executed at a later point in time. Additionally, such systems or data stores have not supported querying of information in ways that involve inspecting the changes that have occurred in the data, nor have they supported maintaining and querying against information about lifetimes of facts even after facts have been “deleted”.
Techniques related to temporal information storage in semantic stores and temporal querying thereof are discussed below.
The following summary is included only to introduce some concepts discussed in the Detailed Description below. This summary is not comprehensive and is not intended to delineate the scope of the claimed subject matter, which is set forth by the claims presented at the end.
Described are techniques to facilitate temporal features in a semantic data store. Information about lifetimes of facts in a semantic store is maintained. Even when a fact is logically deleted, a physical record is kept available. The record of a logically deleted or invalid fact has associated lifetime information. For example, valid-from and valid-to time values. The record of a fact not yet deleted may have a valid-from time value indicating when it was created, became valid, etc. Queries against the semantic store may specify a timeslice (a point in time or a time range). The lifetime information can be used to satisfy such time-specific queries. Because records are maintained after they are logically deleted, it is also possible to accurately query a past state of the semantic store. Even if such a query is run at different times, same results may be obtained.
Many of the attendant features will be explained below with reference to the following detailed description considered in connection with the accompanying drawings.
The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein like reference numerals are used to designate like parts in the accompanying description.
Embodiments discussed below relate to temporal binding in semantic storage systems and in particular to provide temporal features for semantic querying. Discussion will proceed by first discussing semantic or graph storage technology. A technique for using tables to provide fact lifetime support will be described next, followed by discussion of how such techniques may be used in practice when destructive operations occur. Performance of queries will be covered next, followed by discussion of how a same query can produce consistent results even though run at different times and even after facts matching the query may have been logically deleted.
At step 126, lifetime information is created for the newly stored fact. The lifetime information is any stored information in the graph store 100 that indicates a time corresponding to creation or beginning of logical validity of the new fact. In one embodiment, the query itself might specify the creation or “validity” time of the new fact. In other embodiments, the lifetime information is simply any time corresponding to the addition of the new fact, such as a time when the query was submitted, a time when the fact was actually stored, a time when the query was received, etc. Note that as used herein “time” may mean a time of day, a calendar date, a combination thereof, coordinated universal time (UTC), or any unit of time. Thus, some facts (e.g., facts of certain fact types) or all facts stored in the graph store 100 may have lifetime information indicating when such facts became valid, available, added, etc., in the graph store 100.
The lifetime information can be stored in a number of ways. For example, a special table can store a “valid-from” value (a time from which a fact becomes valid, created, etc.) for each fact. In another embedment, explained in detail below, each fact, stored as a row in a table, for example, has a valid-from time value indicating a time at which the fact became valid, added, generated, etc. For example, a table storing the new fact may have a valid-from column storing time values of facts that indicate the starts of lifespans of the corresponding facts.
When a destructive operation is performed that will delete a fact, or at least a fact that is a type for which lifespan information is maintained (it is not necessary that all fact types in a graph store have lifetime information), a fact deleting process 128 is performed. Note that as used herein, physical deletion will refer to physically removing a fact or record from storage, whereas logical deletion will refer to modifying a fact in a way that reflects its having been deleted but while still physically storing a version of the fact or record. At step 130 a request or query that would delete a fact is received. In response, at step 132, the affected or target fact is modified in a way that reflects its having been deleted. In one embodiment, the fact or corresponding record is flagged as having been deleted. In another embodiment, discussed below, the fact or corresponding record is automatically migrated to an archive table; it is physically deleted from one table (e.g., a “current” table), and physically added to the archive table. At step 134 lifetime information is added to the graph store 100 that indicates or corresponds to an ending time of the lifespan of the deleted fact. For example, a time when the destructive request was issued, received, or executed, or a time when the fact was migrated, created in the archive table, a response was issued, or otherwise deemed logically invalid. In one embodiment, the archive table has a valid-to column and an end-of-lifetime time for the deleted fact is stored in the valid-to column. In another embodiment, similar information is stored in a single global table for all logically deleted facts.
To support temporality, that is to store the information such as lifetime information that can support the binding of time as an independent variable, the set of tables defined for the Vehicle type are modified as follows. The original table 142 called [Vehicle$Table] is still created, but modified table 144 is provided with an extra column valid-from column 146. As with table 144, the modified table 144 will store all current values of the corresponding record or fact type, and it is this modified table 144 that will be queried if the selected time for a desired query is ‘now’. For example, when no time value or time range is specified in a query, the time is by default assumed to be ‘now’.
In addition to modified table 144, an archive table 148 is defined in a manner similar to the modified table 144, but with the addition of a valid-to column 150, as well as another valid-from column 152. The valid-from columns 146, 152 are used to store information identifying the moments or times the corresponding values held in the given rows in some way became valid. The valid-to column 150 stores information identifying first moments or times when the values held in the corresponding rows were not valid. These columns, then, represent the half-open range times or lifespans that the rows in question should be considered to be relevant to the values they represent.
With this embodiment, any destructive operation that would otherwise have been performed against a current table, for example modified table 144, may be generally performed as noted in the above-referenced patents. However, the data that would be lost from such a destructive operation is backed up in a corresponding archive table, for example, archive table 148. Moreover, support for continued querying of archived data can be provided by creating a dedicated table with appropriate SQL (structured query language) indexes to support high-performance querying as prior points in time are requested. Note that a valid-to column is not included in a current table (as opposed to an archive table) because values held in the current version of a table, for example modified table 144, are considered presently valid. In other words, their presence in the current table itself signifies present validity, and valid-to information is not needed because the time at which records in a current version of a table might cease to be valid is not yet known while the records are stored therein.
Given this physical representation, the graph model is presented over the basic record type's table as discussed in the above-referenced patents. For example, the selection of table-valued functions (TVFs) may be used to represent each of the possible monadic binding points available for each predicate.
As shown above, for efficient performance, some implementations may handle the storage of temporal values based on the idea that facts or records that are “current” may be stored separately from facts or records as they have existed in the past. Fact-storing tables may be extended to include a single additional valid-from column that indicates a first point in time that the fact existed with the member values included in the corresponding row. To store the values of facts as they have existed in the past, each of these “current” tables has an “archive” table counterpart. As discussed, these tables will mimic the structure of non-temporal tables but with the addition of valid-from and valid-to columns. Additionally, valid-from may be appended to the primary key definition for each of the archive tables. Note that with the key defined as a composite of the corresponding record type's identity and its valid-from time, all states of a given fact will be stored contiguously in the corresponding archive table.
In one embodiment, rather than migrating a record from a current table to an archive table when logically deleted, records are added to the archive table during the lifetime of the record. If a value is created at t0 there will be a row in the corresponding current table with valid-from=t0 and a duplicate row in the archive table with valid-from=t0 and valid-to=NULL. If this value is modified at time t0, the corresponding row in the current table will be modified in-place to reflect the change and have its valid-from value updated to be t1; the existing row in the archive table will be modified to reflect valid-to=t1; and a new row will be added to the archive with valid-from=t1 and valid-to=NULL. If the row is subsequently deleted at time t2 it will no longer exist “currently” (in the logical sense), so the row will be deleted from current table and the row in the archive table will be modified to have valid-from=t1 and valid-to=t2.
Number | Name | Date | Kind |
---|---|---|---|
6339774 | Nakayama et al. | Jan 2002 | B1 |
7693823 | Liu | Apr 2010 | B2 |
7702725 | Erickson et al. | Apr 2010 | B2 |
8150835 | Boldyrev et al. | Apr 2012 | B2 |
8156107 | Bawa et al. | Apr 2012 | B2 |
8156134 | Sun et al. | Apr 2012 | B2 |
8190555 | Venugopal et al. | May 2012 | B2 |
8280924 | Olenick et al. | Oct 2012 | B2 |
20040107210 | Yang et al. | Jun 2004 | A1 |
20040243531 | Dean | Dec 2004 | A1 |
20050120062 | Sinha | Jun 2005 | A1 |
20060041661 | Erikson et al. | Feb 2006 | A1 |
20070011233 | Manion et al. | Jan 2007 | A1 |
20070271242 | Lindblad | Nov 2007 | A1 |
20080189239 | Bawa et al. | Aug 2008 | A1 |
20080208820 | Usey | Aug 2008 | A1 |
20080228482 | Abe | Sep 2008 | A1 |
20090132503 | Sun et al. | May 2009 | A1 |
20100017379 | Naibo | Jan 2010 | A1 |
20100036788 | Wu et al. | Feb 2010 | A1 |
20100070448 | Omoigui | Mar 2010 | A1 |
20100083240 | Siman | Apr 2010 | A1 |
20100121884 | Peoples et al. | May 2010 | A1 |
20100153368 | Peoples et al. | Jun 2010 | A1 |
20100153369 | Peoples et al. | Jun 2010 | A1 |
20100198778 | Venugopal et al. | Aug 2010 | A1 |
20100287179 | Peoples et al. | Nov 2010 | A1 |
20110072003 | Boldyrev et al. | Mar 2011 | A1 |
20120166372 | Ilyas et al. | Jun 2012 | A1 |
Entry |
---|
“Time Challenges—Challenging Times for Future Information Search” Authors: Published Date: Jun. 2009. |
Time-Dependent Semantic Similarity Measure of Queries Using Historical Click-Through Data—Published Date: May 26, 2006. |
Benchmark Queries for Temporal Databases—Published Date: 1993. |
Number | Date | Country | |
---|---|---|---|
20120158771 A1 | Jun 2012 | US |