The present invention is in the field of temporal databases and, more particularly, relates to updating and reading rows of a temporal database.
Tables in a relational database are characterized by, among other factors, the presence of a primary key column or columns. The primary key value for a row in a table, in the primary key column, uniquely identifies that row. Thus, every row in a relational database is uniquely identified by a primary key value. Deletions and updates to a relational table destroy old information, leaving only the most current row versions and/or omitting deleted rows. This type of database is sometimes referred to as a current-version database.
On the other hand, a temporal database is typically implemented as a variant of a relational database. A temporal database adds the dimension of time to relational tables. In a temporal database, each row is uniquely identified by a primary key value (like a current-version database), but each row is further qualified by time. Updates to a temporal database retain previous data values, identifying the prior versions of data (i.e., the prior row versions) as being older than current row versions. Thus, each row may have one or more versions.
One conventional temporal database implementation model is the valid-time table structure, which has two timestamp columns not typically present in current-version relational tables. The two timestamp values for a row version identify the starting and ending points (starting and ending timestamps, respectively) of a period of validity for that row version. The presence of both timestamps in each row version is considered both convenient and efficient for indexing and query evaluation.
Typically, then, temporal databases are queried “as of” a point in time. A row version in a valid-time table satisfies an “as of” query criteria if the “as of” time falls within the period characterized by the starting and ending timestamps for the row version. The row version containing both starting and ending timestamps can be thus evaluated on its own, without references to any prior or succeeding versions of the row. A deleted row is represented as all versions of the row having an ending point in the past (i.e., the omission of any current version of the row).
It is believed that the benefits of the valid-time structure generally outweigh the storage overhead of having two timestamps for each row version, even though the ending timestamp for each row version is often the same as the beginning timestamp for the succeeding row version. Thus, as the number of row versions increase, the number of redundant timestamps approaches fifty percent of the total number of beginning and ending period timestamps.
A method of accessing a version of a row in a temporal database includes checking at least a timestamp associated with the version of the row against a lock criteria for the row. Based on a result of the checking step, it is determined whether to access the version of the row. The version of the row is accessed without acquiring a lock for the row.
We now discuss some properties of temporal databases by illustrating examples of interactions with an illustrative temporal database. First, it is noted that, when and if a succeeding version of a row is to be inserted into the database, the ending timestamp of the now-current (and soon-to-be previous) version of the row is updated to a timestamp no later than the starting time of the succeeding version of the row. Being implemented in a relational database, both row versions, the previous row version and the new row version, are locked by the updater for the duration of the transaction creating the new row version.
The concept of locking row versions for the duration of a new version creation transaction (and some associated pitfalls therewith) is discussed with reference to
Continuing with the example discussed above with respect to
Furthermore, when the read eventually does occur, the read itself locks the rows being read. Thus, subsequent updating transactions are in turn delayed until the readers release their locks. The concurrency potential of the temporal database, then, may be considered reduced to that of the current-version database. Perhaps surprisingly, the presence of so much redundant data does not by itself necessarily translate into increased concurrency, because transactions updating the temporal database are acquiring and holding possibly twice the number of row locks that would be held when updating a current-version database.
An alternative is provided to relying on row locks for concurrency control in a temporal database, increasing concurrency. The row-locking and unlocking behavior is not necessarily disabled. Rather, the locking activity of the underlying relational database is “tolerated.”
To support reading of a current row version concurrent with a subsequent versions of the row being generated, the concept of a time-range lock is introduced. In one example, a table is provided containing a single timestamp column for a row indicating the start of the lock period, and the lock period end is always assumed to be “forever.” In an alternative example, the lock period ending timestamp is stored as well as the lock period starting timestamp. The lock period is not associated with particular primary key values per se. (Rather, the lock period applies to row versions associated with a particular branch. The concept of branching is discussed later.)
With the time-range locking table in place, updating transactions begin by posting a row version that is within the period of a locked time-range. A row version “within” the period of a locked time-range lock is considered unavailable for normal reading. (What is “within” in some examples is discussed later.) The updating transaction has responsibility for later removing the posted time-range lock. If the time-range is already posted as locked, the updating transaction posts an additional time-range lock entry, similarly taking responsibility for removing it later.
For example,
In order to process concurrently with updaters, readers not only avoid waiting for row locks to be released, but readers also avoid acquiring database read locks. That is, with respect to avoiding waiting for row locks to be released, readers avoid accessing data of row versions whose validity period is contained within a time-range lock period for the row. Readers may be exposed to reading versions of rows that are locked for update by the database manager, because the lock period ending timestamp is changing in these rows. The possible allowable replacement values for these lock period ending timestamps are, however, bounded by the time-range lock period definition. Thus, for the purpose of deciding whether to access a row version, a row version with a lock period ending timestamp falling anywhere within a lock period for the row is interpreted as having a lock-period ending timestamp matching the end of the lock period for the row.
In one example, the lock period ending timestamp for any row version with a lock period ending timestamp greater than the start of a lock period for the row is interpreted as having the value “forever.” The ability to compute a valid high value for the lock period ending timestamp shields the reader from a possibly uncommitted lock period ending timestamp value. Significantly, a reader accessing only row versions with lock period starting timestamps outside of the range lock period for the row can safely evaluate these row versions without acquiring row locks. Readers are not blocked by updaters, nor are updaters blocked by readers, as only the updaters are acquiring row locks.
Using the time-range locking method described, concurrency among updaters can be raised by the introduction of branching. By itself, branching of a temporal database is known, for example, to analyze alternate plans, over time, to meet projected transportation system demand levels. Each branch could represent a particular transportation plan and its possible outcome. In other words, time is treated as being multi-dimensional, rather than one-dimensional. A branched and temporal database allows concurrent timelines to proceed within separate branches of the database.
In accordance with an example, an updater of a row of a branched and temporal database, holding a time-range lock, blocks other updaters only to the extent of the row-locking mechanism of the underlying relational database manager. That is, updaters of rows having identical primary key values in the same branch are subject to waiting behind one another. However, updaters of rows having different primary key values are not forced to wait behind one another, nor are updaters of rows in different branches (even rows having the same primary key value) forced to wait behind one another.
For applications modifying a database using large transactions, or modifying the database in a manner creating temporary inconsistencies that should not be viewed by readers, the database remains available for reading regardless of the locking activity or temporary inconsistencies of the update processing.
Branching is not necessarily a substitute for time-range locking.
In the
We now discuss some particular examples of steps for accomplishing the use of time-range lock periods, as discussed above.
First, we describe an example of steps to set up database structures to enable the use of time range lock periods.
Steps Involved in Setting up the Database Structures:
1. Create a table for branch definitions.
2. Create a table for time-range lock definitions.
3. Add a foreign key to associate a time-range lock with a branch.
4. For each application table with a primary key:
Next, we describe an example of steps to create a new child branch.
Steps Involved in Creating a New Child Branch:
Next, we describe two examples of steps to update user data:
First example of steps involved in updating user data in a database (the time-range locks are read without acquiring locks):
Next, we describe an example of steps to read a database
Steps involved in reading from the database:
1. Choose which branch to read (root branch is default).
2. Configure the reading application to use uncommitted read isolation (to not acquire read locks).
3. Computing query results:
Alternatively, if read lock acquisition behavior can be more precisely controlled, locks could be acquired when reading the time-range lock definitions, with a corresponding alteration of the updating application, as noted in the earlier section.
Before concluding, we provide herewith a brief summary of the various time indications discussed herein:
First, from the perspective of a child branch, the branch definition is characterized by a timestamp (the branch floor). The branch contains only row versions newer than this branch floor timestamp. The chosen timestamp is arbitrary, though the typical branch time will either be the time that the branch is created, or some time in the near past (e.g., when the last accounting period closed). The branch floor does not change or go away with time-range locking activity.
A temporary time-range lock (or locks) could exist on the child branch, restricting the branch versions available for reading. The earliest lock-range timestamp is the lock ceiling for the branch. Rows created more recently than the lock ceiling are not yet eligible for reading. Because time-range locks are temporary, sometimes there will be no lock ceiling.
Ancestor rows are available for inheritance if the following conditions are met:
We now provide a specific example, to illustrate the use of time-range locks relative to branching. Assume that branch A is a branch of the root. Branch A has a branch floor of Jan. 1, 2005, 2:15 PM. Assume that branch B is a branch of branch A. Branch B has a branch floor of Jan. 3, 2005, 3:35 PM.
Ignoring time-range locks for the moment:
Now, accounting for time-range locks:
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.