The invention concerns a database system which provides multiple views of the database. The system assures that, when a user reads views, consistent data is delivered to the user.
For example, assume that the database is a nationwide telephone directory. A user may issue a query requesting retrieval of all telephone numbers assigned to parties named Miller, who live on Main Street, in all cities nationwide. The management system will return these telephone numbers to the user.
In many situations, it is convenient for users of the database DB to deal with a subset of the database, rather than with the database itself. Further, it also may be convenient for these subsets to be formatted differently, in order to suit the users' preferences.
These subsets are termed “views.” Continuing the example given above, one view may contain all telephone data within the state of New Jersey. If the user issues the same query identified above, but to this view instead of to the database as-a-whole, only telephone numbers of parties in New Jersey would be retrieved.
Views are generated, or defined, through the use of queries. A view is either virtual or materialized. A virtual view is not physically stored as a subset of data in permanent storage, such as a fixed drive or tape. Rather, it is computed on demand by executing the query which generates the view, and the results of the query are stored in system memory.
In a materialized view, a query also generates, or defines, the view. However, unlike a virtual view, the results of the query which generates the materialized view are stored in permanent storage.
With the use of materialized views, multiple instances of a single piece of data can exist. For example, an original piece of data can exist in the database, and copies of that same data can exist in materialized views. If one of these instances of data changes, then a person reading two copies of the same underlying data may see different values of the data. For instance, in the example given above, if Miller's telephone number has changed, the person might see both Smith's current and previous phone number. In many situations, this inconsistency cannot be tolerated.
These inconsistencies can be caused by transactions which modify the database. A database transaction can be viewed as a series of commands starting with a “BeginTransaction” command and completing with either an “AbortTransaction” or “CommitTransaction” command. An “AbortTransaction” command rolls back all work performed by the transaction, and returns the database to the condition prevailing prior to the “BeginTransaction” command. A “CommitTransaction” command causes the transaction to take effect, and makes the results of the transaction durable, by storing sufficient information on stable storage (e.g., disk) to ensure that none of the transaction's actions will be lost.
The data in the database is stored in the form of tuples. Before a transaction reads or writes a tuple, the appropriate read-or write-lock must be acquired. These locks prohibit other parties from gaining access to the locked data. This prohibition prevents the other parties from reading or modifying the data in manner different from the transaction's modifications, and thereby prevents inconsistencies from arising.
To perform any of these transactional tasks, the underlying database transaction manager must be invoked. Transaction managers having the capabilities described above are known in the art. However, existing managers, while preventing the inconsistencies described above from occurring in base data of the database itself, do not necessarily prevent inconsistencies from occurring in transactions which read materialized views of the database.
In one form of the invention, a database manager generates views. When a transaction seeks to issue a read-lock on a target tuple in a view, the invention attempts to lock a superset of tuples in the database. If certain conditions are met, the attempted lock succeeds.
The superset contains the tuples from which the target tuple is derived. Locking the superset prohibits changes in the superset-tuples, which may cause inconsistencies between the superset-tuples and the target tuple. However, the superset may also contain tuples which are not involved in deriving the target tuple, so that unrelated tuples may become locked. A trade-off occurs.
On the one hand, it is computationally expensive to identify a minimal set of tuples in the database from which the target tuple is derived, and lock only that minimal set. On the other hand, it is inexpensive to identify the superset. The disadvantage of locking the superset, including extra tuples, is seen as offset by the convenience in avoiding computation of the minimal set.
The invention provides extensions to the capabilities of existing transaction managers, including three new routines for reducing inconsistencies which these managers can produce.
One routine eliminates inconsistencies entirely. The other two eliminate inconsistencies entirely if certain conditions hold. A particular transaction will either (1) use the conventional transaction manger, or (2) repeatedly use exactly one of these three extended routines, in the course of executing transactional tasks.
Logic executed by the invention will be explained by reference to flow charts. In the flow charts, the symbol “T” and “Tm” refer to a transaction, which is a group of operations; “V” and “VS” refer to views; “U” refers to a base tuple, which is a tuple contained in a database, and which can be either written to, or read; “Tu” refers to a view tuple, which is a tuple contained in a view, and which can be read, but not written to, by database users.
A materialized view tuple can be modified as part of maintenance to bring it up-to-date with the underlying base data.
Input: View V
Output: Materialization of V will be made consistent with the Current State of the Base Tables from which V is Derived.
In
If refreshing is not required, then block 6,060 is reached, and the processing terminates. In this case, the view has been maintained, but not refreshed. A view is refreshed when maintenance must modify its contents.
If refreshing is required, block 6,020 is reached, which places into a BASESET all base relations of the database which are needed to derive view V. These base relations are identified through a dependency graph G.
View V1 is derived from a single source, namely, base table B1. However, view V10 is derived from two sources, namely, base tables B4 and B5. A preferred approach to identifying the base relations in the dependency graph is through use of a depth- or breadth-first traversal, as indicated in FIG. 3.
In effect, by using the dependency graph, block 6,020 identifies all relations, also called tables, needed to construct view V. These relations will be updated in later steps.
Next, block 6,030 is reached, which reads the logs for all base relations in the BASESET. Logs store information about changes made to the base relations. When a base relation's contents are changed, information about the changes are stored in a log for that relation. The logs allow previous states of the relation to be reconstructed, and are used, for example, if current relations becomes corrupted.
Block 6,030 reads the log entries for all base relations in the BASESET. That is, all log entries for all relations in the database which are necessary to produce view V are read (i.e., all relations that appear in the query that defines V). Within these relations, block 6,040 identifies the tuples which have changed for view V, using the dependency graph G. When the changed tuples have been identified, block 6,050 writes the changed tuples to the materialization of view V.
For maintenance purposes, the invention treats all views used in the definition of other views as virtual. Hence, the invention performs maintenance only in terms of the underlying base data. For instance, view V16 in
After block 6,050, block 6,060 is reached, ending the maintenance routine. The routine then returns to the point in the program which called the maintenance routine.
Therefore, in
INPUT: Tuple(Tu), from View V, which is to be Refreshed
OUTPUT: Refresh is Performed
In
In practice, a superset of DS(Tu) is used which is easy to compute, rather than the exact set DS(Tu) which may be quite expensive to compute. For simplicity, the term “DS(Tu)” will be used to mean a particular superset of DS(Tu). A particularly simple-to-compute superset of DS(Tu) is the set of all tables mentioned in the query defining the view V which contains Tu.
Another algorithm for computing DS(Tu) is found in “Concurrency Control Theory for Deferred Materialized Views,” by A. Kawaguchi, D. Lieuwen, I. Mumick, D. Quass, and K. Ross, in pp. 306-320 (esp. pages 312, 313) of “Database Theory—ICDT'97, 6th nternational Conference Proceedings,” Delphi, Greece, January, 1997, published by Springer, Berlin, Lecture Notes in Computer Science 11-86.
If the answer is No, then block 8,070 is reached, and processing terminates. If the answer is Yes, then block 8,020 is reached, in which the log entries for the tuples in DS(Tu) are read from a log.
As explained above, log entries indicate the changes which have been made to an original base tuple. Block 8030 reads the tuples in DS(Tu). Block 8040, using the log entries of block 8020 and the tuples of block 8030, computes the changes made to the original tuple Tu, and modifies the original to reflect the changes. Now the original tuple Tu has been modified to be current. Processing terminates in block 8070.
Therefore, in
INPUT: operation I {r[u], w[u], rv[Tu], BeginTransaction, CommitTransaction, AbortTransaction} of Transaction T, wherein:
OUTPUT: Wait-or-Proceed Decision, so that 2PL Schedules with Strict Currency are Produced.
All operations except rv(Tu) are standard operations supported by the locking subsystem of any transaction manager supporting two-phase locking. Two-phase locking is known in the art. An extensive treatment of how to build the mechanisms underlying two-phase locking can be found in Jim Gray and Andreas Reuter, Transaction Processing: Concepts and Techniques, 2nd printing, Morgan Kaufmann, 1993.
The invention builds a locking protocol on top of a known transaction manager which properly handles views. The Specification will describe this by making reference to the 2PL routine called in the flow charts of FIG. 5 and others.
The underlying transaction management machinery knows nothing of views. Hence, it will treat a view tuple just like a base tuple in terms of locking. This leads to inconsistencies if additional machinery is not employed. However, it also means that the machinery can be used to lock view tuples to prevent other transactions from reading the maintained tuples until the transaction that does the maintenance completes.
In brief, two-phase locking entails two phases: a growing phase, and a shrinking phase. The growing phase exists while an transaction is requesting that locks be granted. However, once the transaction releases one, or more, locks, the shrinking phase begins. During the shrinking phase, no further locks can be acquired by the transaction.
In
Next, block 9040 calls the routine MAINTAIN(V), which was described in connection with FIG. 3. Alternatively, the process RefreshTuple(Tu) in
Optimizations are possible. For example, if the transaction has already performed MAINTAIN(V) and has not modified any tuples of the base relations used in the definition of V, then the transaction need not re-execute MAINTAIN(V). Similar optimizations will be readily apparent to those skilled in the art.
Block 9050 requests a read lock from the underlying storage manager for the view tuple Tu. The underlying storage manager is not aware that Tu is anything more than a standard tuple, and so it can lock Tu using normal procedures. To guarantee consistency, both the locks on DS(Tu) and on Tu are required. Processing terminates in block 9070.
INPUT: Operation I BeginTransaction(ReadSet), r[u], w[u], rv[Tu], AbortTransaction, CommitTransactiony of Transaction T, wherein:
OUTPUT: Wait, Proceed, or Abort Decision, so that 2PL Schedules with Loose Currency are Produced. Views in ReadSet are maintained.
In
If the NO branch is taken from block 10,020, then decision block 10,030 is reached. This block inquires whether operation I is the beginning operation in a transaction T, wherein transaction T contains a pre-declared read set, named ReadSet. If not, processing proceeds to block 10, 040 and the underlying transaction manager handles the total request. Processing terminates in block 10,050.
If operation I does represent the beginning of such a transaction T, then block 10,100 in
The pre-declaration of the read set also acts as a request for maintenance of all views in ReadSet. Consequently, special operations must be undertaken in order to handle transaction T. In block 10,105, a read lock is imposed on every base data item listed in the read set. The locks are acquired using the underlying 2PL routine used elsewhere (e.g., block 10,040). The term “base data item” refers to data items within the base relations, as opposed to items in views.
Next, block 10,200 assigns to a variable VS the set of views in ReadSet. Then, block 10,300 spawns, or launches, a maintenance transaction Tm. The maintenance transaction Tm maintains all views in set VS, using the maintain routine of FIG. 3. After completing, Tm returns its transaction identifier, M, or an abort indication.
If, in decision block 10,400, maintenance transaction Tm aborted, then block 10,500 is reached, which calls a 2PL abort routine which operates on transaction T. This 2PL routine restores the status quo to the system, returning the system to its condition prior to initiation of transaction T, since T has been aborted. Then, as indicated, block 10,050 is reached.
If maintenance transaction Tm does not abort, then, in block 10,600, a check is made to see if any of the views in VS were refreshed after Tm committed. In order to allow this check, the transaction identifier of the last transaction to refresh a view V is stored in the database, or in some other place, such as a server in the network.
Some system, such as the database or a server, also keeps track of the sequence of transaction commits. If any of the views were refreshed after Tm committed, block 10,500 is reached, and the transaction T aborts. Transaction T is aborted because, if the views were maintained at different times, the possibility of inconsistency exists.
Another maintainer has refreshed at least one view after Tm completed. Continuing might lead to inconsistent results.
As one summary of the preceding: the logic of
INPUT: Operation I (BeginTransaction, AbortTransaction, CommitTransaction, r[u], w[u], rv[Tu]) of Transaction T, wherein:
OUTPUT: Wait, Proceed, or Abort Decision, so that 2PL Schedules with Periodic Currency are Produced. Views in ReadSet are maintained.
In
If operation I does not represent the beginning of transaction Tr, then, by inference, operation I is not the beginning step of a set of operations, but one of the operations themselves. Decision block 11,060 is reached, which inquires whether operation I requests a read lock on a tuple Tu of view V. If not, block 11,020 is reached, and the underlying transaction manager is handed the request I for processing. Processing completes in block 11,030. If operation I requested a read lock on tuple Tu of View V, block 10,070 is reached.
Block 11,070 inquires whether V was refreshed at a different time than any of the other views seen by the current transaction. If so, block 11,075 is reached and the transaction is aborted. Otherwise, inconsistent data may be seen. If not, then transactionally consistent views have been seen thus far.
The current view is added to the set of views seen thus far by the transaction in block 11,080. Block 11,090 acquires a read lock from the underlying transaction manager. Processing terminates at block 11,030.
Numerous substitutions and modifications can be undertaken without departing from the true spirit and scope of the invention. What is desired to be secured as Letters Patent is the invention as defined in the following claims.
Number | Name | Date | Kind |
---|---|---|---|
5237678 | Kuechler et al. | Aug 1993 | A |
5280612 | Lorie et al. | Jan 1994 | A |
5317731 | Dias et al. | May 1994 | A |
5440735 | Goldring | Aug 1995 | A |
5452445 | Hallmark et al. | Sep 1995 | A |
5594899 | Knudsen et al. | Jan 1997 | A |
5666526 | Reiter et al. | Sep 1997 | A |
5692178 | Shaughnessy | Nov 1997 | A |
5701480 | Raz | Dec 1997 | A |
5832484 | Sankaran et al. | Nov 1998 | A |
5893117 | Wang | Apr 1999 | A |
5940827 | Hapner et al. | Aug 1999 | A |
5983225 | Anfindsen | Nov 1999 | A |
5999930 | Wolff | Dec 1999 | A |
5999931 | Breitbart et al. | Dec 1999 | A |
6026413 | Challenger et al. | Feb 2000 | A |
6032216 | Schmuck et al. | Feb 2000 | A |