A database often includes objects, such as tables, stored in a database catalog. The objects may represent metadata related to actual data. Performance of a query related to an object is often based, for example, on the efficiency of location and retrieval of the object from the database catalog. Objects are also often considered to be related to each other based on dependencies. Factors such as the number of objects, the dependency structure of objects, changes to the dependency structure, etc., typically impact the efficiency of query performance.
Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to examples thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
Throughout the present disclosure, the terms “a” and “an” are intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
An object dependency management apparatus and method are described herein and provide snapshot and versioned read access, for example, to a database catalog or any structure generally for storing objects or related to storage of objects. Use of multiple contexts to allow for varying context-level refinement is also provided. The apparatus and method also provide for maintenance of object dependencies for facilitating efficient query performance and historical access. For example, the apparatus and method maintain versioned dependencies such that version numbers that uniquely identify each particular state of the database catalog are maintained and may be associated with a snapshot.
For the object dependency management apparatus and method, an object may be defined as any entity, such as a value, variable, function, or data structure, that is manipulated by machine readable instructions. For example, for objects related to a table and a user, a table may depend on the user who has ownership of the table. For example, an analysis of objects from a user viewpoint may need to determine all the tables that depend on a user. Further, an analysis of objects from a table viewpoint may need to determine which user a table depends from. The dependencies may be used, for example, to determine which tables should be dropped when a user is dropped or to determine which tables to audit given a user that is currently logged into a database. Every object may depend on more than one object and objects may have multiple dependents.
In order to provide concurrent access to the database catalog or other structure, the object dependency management apparatus includes a snapshot module. The snapshot module assigns appropriate version numbers to all changes (i.e., adds, edits, and drops) to an object so that references to a database catalog may filter between changes made before and after the requested version. Consider the following objects: a user-U, a user-V and a table-T. An add may be defined as the table-T being added as a dependent to user-U or user-V. An edit may be defined as a modification to the table-T. A drop may be defined as the table-T being removed as a dependent from user-U or user-V. The snapshot module maintains lists of added and dropped dependencies which hold object identifications (IDs) and the version in which they were added or dropped. Upon access, the adds and drops are resolved to determine which IDs were added but not yet dropped at the access requested version. As snapshots are closed, according to an example, a cleanup module uses an amortized approach to efficiently cleanup stale dependencies that will be seen as dropped to all existing and future snapshots.
To reduce shared lock contention and thereby increase performance, the snapshot module allows users to acquire a snapshot, or a consistent view of the database catalog at a given logical time, and continue to access that view until the user releases the snapshot. The view is consistent if none of the object properties or dependencies change in the eyes of the user when the snapshot is accessed, regardless of what changes may have occurred in the database catalog itself since the snapshot was accessed. Thus, on subsequent reads of the database catalog, the user may observe the state of the database catalog and the dependencies between its objects as they existed at the time of the snapshot. The user may not observe any further changes in the database catalog that occur after the snapshot was acquired, including the adding and dropping of objects. A subsequent snapshot would however observe the adds and drops. Therefore, the snapshot module generally supports reads against snapshots taken at different times.
The object dependency management apparatus and method also provide for the organization of objects using multiple contexts, or tiers, that form a tree structure and are accessed in a linear hierarchy from a leaf to the root. For example, a global context module may be provided for maintaining a global context for the database catalog and a session context module may be provided for maintaining multiple session contexts for each database session. Thus, the global context takes into account all changes to dependencies of objects, and the session contexts are based on a state of the database catalog at a specific time of a query related to an object. Thus, a session is specific to a user's connection to the database in that while some changes may exist in the user's session for them to observe and disappear when the session ends, other changes are committed to the global context. Further, multiple users may access a database catalog and obtain results specific to their own session contexts. Moreover, changes made by a particular user to a dependency of an object may be committed to the global context to then be viewed by subsequent users, while users who accessed the database catalog prior to the changes would not see the changes in their particular sessions. When determining the characteristics of an object from a lower context (i.e., a lower tier session context), access may begin with the root context (i.e., the global context) and proceed to the requesting context (i.e., the specific lower tier session context), applying changes from each intermediate context (i.e., each intermediate session context) along the way. Therefore, two sessions may see the same state from the global context, but each may maintain and apply their own unique changes at the session context level. The use of multiple contexts to differentiate and layer changes also provides a concise and organized way to manage alterations to the database catalog.
The object dependency management apparatus and method disclosed herein provide for the support of dependency queries against multiple fully-consistent snapshots of the database catalog in an order-of-growth runtime-efficient manner while maintaining space-efficient storage. Thus, each concurrent read of the database catalog observes changes made before the beginning of the reading process, even though the database catalog is constantly being updated. The cost of maintaining the dependencies is amortized over the adds, drops, and accesses to the database catalog.
The modules 101 and 105-108, and other components of the apparatus 100, may comprise machine readable instructions stored on a computer readable medium. In addition, or alternatively, the modules 101 and 105-108, and other components of the apparatus 100 may comprise hardware or a combination of machine readable instructions and hardware.
According to an example, the snapshot module 101 assigns appropriate version numbers (e.g., version numbers 1, 2, 3 . . . n) to all changes (i.e., adds, edits, and drops) to the dependency of the objects 102 so that references to the database catalog 103 can filter between changes made before and after the requested version. For example, as discussed above, for the objects user-U, user-V and table-T, the snapshot module 101 assigns appropriate version numbers to all changes to the dependency of these objects. Generally, the snapshot module 101 maintains two version ID lists per given object. An add version ID list is maintained for all the objects that are added as dependents to a given object, including the version at which they are added. A drop version ID list is maintained for all the objects that are dropped as dependents to a given object, including the version at which they are dropped. For example, for the objects user-U, user-V and table-T, adding a table-T as a dependent to a user-U results in the add version ID list for user-U being updated to reflect the addition of table-T. The snapshot module 101 may assign version number 1 to the addition of the table-T as a dependent to the user-U. In this case, no change would be made to the add or drop version ID lists for user-V. Similarly, dropping the table-T as a dependent from the user-U results in the drop version ID list for user-U being updated to reflect the dropping of table-T. The snapshot module 101 may assign version number 2 to the dropping of the table-T as a dependent from the user-U. Likewise, in this case, no change would be made to the add or drop version ID lists for user-V. The snapshot module 101 may assign subsequent version numbers to the adding or dropping of the table-T as a dependent to or from the user-U. Similarly, the snapshot module 101 may assign version numbers to the adding or dropping of the table-T as a dependent to the user-V. For example, adding the table-T as a dependent to the user-V results in the add version ID list for user-V being updated to reflect the adding of table-T. The snapshot module 101 may assign version number 3 to the adding of the table-T as a dependent from the user-V. In this case, no change would be made to the add or drop version ID lists for user-U.
When a query is made for historical access related to an object 102 stored in the database catalog 103, the query specifies the version at which the dependents are desired. For example, for a query related to the objects user-U, user-V and table-T, a query may specify retrieval of a version-x (e.g., version-2 or version-3) of a table-T related to user-U (or user-V). This version number may be used to sweep added and dropped object lists. For example, the version number may be used to add objects created at or below that version to a result list and remove any objects with drops at that version or below.
In order to account for the added and dropped objects for the result list, the result list may be based on the principle that every add has a corresponding drop in the current context or in a higher one. Therefore, resolving from the root context (i.e., global context) to requesting contexts ensures this invariant of add before drop is maintained. Further, the same ID may be added multiple times as long as there is a drop at a version in between each time the ID is added.
Since snapshot versions are monotonically increasing (i.e., every change increments the version number), the cleanup module 107 (described in further detail below) may permanently remove matching add-drop pairs whenever the version of the drop is lower than the lowest version of all the snapshots currently live (i.e., snapshots that are currently in use). Further, the cleanup module 107 may permanently remove matching add-drop pairs whenever the version of the drop is lower than the latest version of the database catalog 103 if there are no currently live snapshots.
An example of operation of the object dependency management apparatus 100 is described for further facilitating a description of the operation of the global and session context modules 105, 106, and the cleanup module 107.
In an example including objects user-U and dependent tables T1 and T2, table T1 may be designated by ID 1 and the table-T2 may be designated by ID 2. The following operations may be performed on objects ID 1 and ID 2:
i) Add dependent ID 1 at version 1.
ii) Add dependent ID 2 at version 2.
iii) Drop dependent ID 2 at version 3.
iv) Drop dependent ID 1 at version 4.
v) Add dependent ID 1 at version 5
The add and drop version ID lists for user-U related to dependent tables T1 (i.e., ID 1) and T2 (i.e., ID 2) are as follows (ID, version):
When a query is made for historical access related to objects ID 1 or ID 2 stored in the database catalog 103, the query specifies the version at which the dependents are desired. For example, based on the analysis performed by the query analysis module 108, a query at version 1 will not find any drops and will find the add of ID 1. Thus, for the query received at the input 109 related to version 1, the result list would result in the object ID 1 generated by the output 110.
In another application of the foregoing example, based on the analysis performed by the query analysis module 108, a query at version 2 will not find any drops and will find the adds of objects IDs 1 and 2. Thus, for the query received at the input 109 related to version 2, the result list would result in the objects IDs 1 and 2 generated by the output 110.
In a further application of the foregoing example, based on the analysis performed by the query analysis module 108, a query at version 3 will find a drop of object ID 2, then find the corresponding add of object ID 2 and remove it from consideration. Finally, the query at version 3 will find the add of object ID 1. Thus, for the query received at the input 109 related to version 3, the result list would result in object ID 1 generated by the output 110.
In yet another application of the foregoing example, based on the analysis performed by the query analysis module 108, a query at version 5 will first find an unmatched add for object ID 1 at version 5, meaning that object ID 1 will be part of the result list. This is followed by matched adds and drops for object IDs 1 and 2 at the lower versions, which are thus ruled out. This is because there should not be consecutive adds without a drop in between for the same ID. Thus, for the query received at the input 109 related to version 5, the result list would result in object ID 1 generated by the output 110.
Thus, based on a query received at the input 109, the query may include a specific version at which the state of dependent IDs (i.e., the dependent IDs that have already been added and have not yet been dropped), is desired. If no version is provided, the query is assumed to be based on the highest version. Based on the version, the query analysis module 108 may first iterate the add and drop version ID lists for the particular objects subject to the query until any versions greater than the request version are omitted. This is because any versions greater than the requested version are not part of the result set. For example, in the foregoing example for the query at version 3, the query analysis module 108 ignores the dependent ID 1 drop at version 4. The query analysis module 108 then effectively merges the add and drop version ID lists, while taking into account that every drop has a corresponding add. Specifically, the add and drop version ID lists may be iterated in decreasing version order, with all drops being temporarily saved and the adds being checked against the temporarily saved drops. In the foregoing example for the query at version 3, the query analysis module 108 may begin by temporarily saving object ID 2 because it was dropped at version 3. Because the query analysis module 108 iterates in decreasing version order, the drop for any add that does have a corresponding drop will be seen before seeing the add. Therefore, any added ID not found as a drop in the temporary structure is part of the result set. Continuing with the foregoing example for the query at version 3, a corresponding add is found for the drop on object ID 2, ruling object ID 2 out of the result set, and an add without a drop for object ID 1. Thus, for the query received at the input 109 related to version 3, the result list would result in object ID 1 generated by the output 110.
Operation of the global context module 105 is described for maintaining a global context for the database catalog 103. Further, operation of the session context module 106 is described for maintaining multiple session contexts for database sessions (e.g., a session context or multiple session contexts for each database session).
The global context module 105 provides for maintenance of a global context for the database catalog 103. Further, the session context module 106 provides for maintenance of multiple session contexts for database sessions (e.g., a session context or multiple session contexts for each database session). Thus, the global context takes into account all changes to dependencies of objects, and the session contexts are based on a state of the database catalog at a specific time of a query related to an object. Referring to
For multiple, layered contexts (i.e., the global and session contexts), while the foregoing aspects related to drops having corresponding adds remains applicable, multiple, layered contexts make it possible for these pairings to cross context boundaries. For example, for the foregoing query example related to user-U that includes dependent tables T1 and T2 (i.e., objects ID 1 and ID 2), the global context module 105 may add a global context based dependency (1,1) and the session context module 106 may add a session context based refinement with a drop of (1,4). In order to handle the session context, the session context module 106 may separate its dropped ID list into two lists. The first drop ID list may be determined as described earlier (i.e., the foregoing drop version ID list). The second dropped ID list may hold IDs that do not have corresponding adds in the context. The separated dropped ID lists may be created during access by moving drops that do not find matching adds to the second dropped ID list. By querying a higher session context for its dependents (which may recursively query its higher session context, if one exists) and removing IDs from the result set that are found in the second dropped ID list, the correct dependents result set remains. Specifically, the correct dependents result set that accounts for all the refinements by lower contexts against the content of higher contexts remains.
Operation of the cleanup module 107 which uses an amortized approach to efficiently remove dependencies is described.
While iterating the add and drop version ID lists for access, any add or drop pairs older than the version being currently used may be deleted entirely. In order to improve efficiency of access to the database catalog 103, the cleanup module 107 may amortize cleanup costs of stale add/drops. For example, when the number of new drops equals the number of adds outstanding, the cleanup module 107 may detect matching pairs of adds and drops that may have become stale. This ensures that the database catalog 103 will never exceed having double the number of IDs (i.e., each ID has an add and a drop) because of staleness. As long as a snapshot is open at a particular version number, no ID with a greater drop version may be considered to be stale and neither may its corresponding add.
The efficiency of the cleanup module 107 is further demonstrated by considering an unamortized implementation that stores the current state of all the dependencies at every snapshot. That is, every snapshot actually replicates the full dependency structure and associates it with the snapshot. This unamortized implementation will provide correct results. If there are k snapshots for an object with an average of n dependencies each per snapshot and c changes to the dependencies between subsequent snapshots (c being much smaller than n), the unamortized implementation will be time efficient. This is because the unamortized implementation may be used to locate the dependencies for a given snapshot in O(lg k), for example, with C++ stl-style maps and then iterate them to build the result set in O(n). However, the unamortized implementation will be expensive in terms of space and will be O(k*n). Compared to the foregoing amortized implementation provided by the cleanup module 107, the runtime is O(n+c) to output the dependents at a given version. Likewise, for the foregoing amortized implementation provided by the cleanup module 107, the space use is O(n+c). Thus the foregoing amortized implementation provided by the cleanup module 107 provides both time and space efficiency.
Referring to
At block 302, the versions may be assigned to dependency changes between the objects. For example, referring to
At block 303, historical access to the objects may be managed based on an analysis of the versions and the dependency changes between the objects. For example, referring to
Referring next to
At block 402, based on the query, the query analysis module 108 may access the appropriate modules to process the query. For example, referring to
At block 403, a version may be assigned to each unique state of a structure for storing objects corresponding to a dependency change to one of the objects. For example, referring to
At block 404, a corresponding version may be assigned to each dependency change between the objects. For example, referring to
At block 405, a snapshot of a state of the structure corresponding to the query related to one of the objects may be generated. For example, referring to
At block 406, an add version ID list for all objects that are added as dependents to a given object may be generated. Further, a drop version ID list for all objects that are dropped as dependents from a given object may be generated. The add and drop version ID lists may be used to determine a dependency state of an object. For example, the add and drop version ID lists may be iterated in decreasing version order to determine the dependency state of the object.
At block 407, a global context of the states of the structure corresponding to all of the versions of the states of the structure may be generated. Further, a session context corresponding to a state of the structure based on a time of a query related to one of the objects may be generated. For example, referring to
At block 408, the dependency changes between the objects may be removed by amortizing adds, drops and accesses to the structure. For example, the dependency changes between the objects may be removed by amortizing dependency changes that are no longer needed for historical access. For example, referring to
At block 409, the results based on the query may be generated. For example, referring to
The computer system includes a processor 502 that may implement or execute machine readable instructions performing some or all of the methods, functions and other processes described herein. Commands and data from the processor 502 are communicated over a communication bus 504. The computer system also includes a main memory 506, such as a random access memory (RAM), where the machine readable instructions and data for the processor 502 may reside during runtime, and a secondary data storage 508, which may be non-volatile and stores machine readable instructions and data. The memory and data storage are examples of computer readable mediums. The memory 506 may include modules 520 including machine readable instructions residing in the memory 506 during runtime and executed by the processor 502. The modules 520 may include the modules 101 and 105-108 of the apparatus shown in
The computer system may include an I/O device 510, such as a keyboard, a mouse, a display, etc. The computer system may include a network interface 512 for connecting to a network. Other known electronic components may be added or substituted in the computer system.
What has been described and illustrated herein is an example along with some of its variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the spirit and scope of the subject matter, which is intended to be defined by the following claims—and their equivalents—in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2012/035787 | 4/30/2012 | WO | 00 | 7/30/2014 |