The present invention relates generally to computer implemented database systems and, more particularly, to a method and system for providing path-level access control to structured documents stored in a database system.
Structured documents are documents which have nested structures. Documents written in Extensible Markup Language (XML) are structured documents. XML is quickly becoming the standard format for delivering information over the Internet because it allows the user to design a customized markup language for many classes of structure documents. For example, a business can easily model complex structures such as purchase orders in XML form and send them for further processing to its business partners. XML supports user-defined tabs for better description of nested document structures and associated semantics, and encourages the separation of document content from browser presentation.
As more and more businesses present and exchange data in XML documents, database management systems (DBMS) have been developed to store, query and retrieve these documents which are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent. Some DBMSs, known as relational databases, store and query the documents utilizing relational techniques, while other DBMSs, known as native databases, store the documents in their native formats. XML documents are typically grouped into a collection of similar or related documents. Thus, for example, a group of purchase orders can form a collection.
Once a collection of documents is stored in the database, relational or native, it is potentially available to large numbers of users. Therefore, data security becomes a crucial concern. In particular, the DBMS must be able to control, i.e., deny or grant, access to the data by the user. In a conventional relational DBMS where the data is stored in rows and columns in tables, security is generally directed to the table level, i.e., access to a table is controlled. While this may be sufficient for relational data, it is inadequate for controlling access to a collection of XML documents because an XML document stored in the database contains information that is much more diverse than data stored in rows in tables.
Access control for XML documents is fine-grained, that is, access to each node in an XML document is controlled. The term “node” is used in the DOM-sense, which is a standard XML construct well known to those skilled in the art. In that construct, the XML document is represented by a plurality of nodes that form a hierarchical node tree. Each node of the XML document is identified by a path that defines a hierarchical relationship between the node and its parent node(s). Thus, fine-grained access control to the nodes of an XML document is referred to as path-level access control.
For example, if an administrator wanted to limit access to a “salary” node in all documents in a collection “all_employees,” the administrator would generate the following statement:
Deny read access on “/employee/salary” in collection “all_employees” to group non-managers
This statement would deny access to all salary nodes with path “/employee/salary” in all documents in collection “all_employees.” This type of statement is referred to as an access control rule. A set of access control rules directed to a collection of documents is referred to as an access control policy.
While it is possible to perform path-level access control evaluation by utilizing access control rules, such evaluation is relatively expensive because the DBMS must evaluate each access control rule to determine whether a user should be granted or denied access to data in a node. This process becomes prohibitive when the number of access control rules in a policy increases. Nevertheless, the alternative, i.e., coarse-grained access control or table level access control, is unacceptable.
Accordingly, a need exists for an improved method and system for providing path-level access control for structured documents stored in a database. The method and system should be integrated (or capable of being integrated) with an existing database system in order to use the existing resources of the database system. The present invention addresses such a need.
The present invention is directed to an improved method and system for performing path-level access control evaluation for a structured document in a collection, where the structured document comprises a plurality of nodes and each of the nodes is described by a path. The method comprises providing a cache for temporarily storing a cache entry for a path associated with a node of the plurality of nodes, receiving a query that includes a request to access the node, checking the cache entry for the path associated with the node, and determining whether to grant access to the node based on the cache entry.
The present invention relates generally to computer implemented database systems and, more particularly, to an improved method and system for providing path-level access control for structured documents stored in a database. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the preferred embodiment and the generic principles and features described herein will be readily apparent to those skilled in the art. For example, the following discussion is presented in the context of a DB2® database environment available from IBM® Corporation. It should be understood that the present invention is not limited to DB2 and may be implemented with other database management systems. Thus, the present invention is to be accorded the widest scope consistent with the principles and features described herein.
According to a preferred embodiment of the present invention, a structured document is parsed into a plurality of nodes. Each node is then associated with a path that describes the node's hierarchical relationship to its parent node(s). An Access Control mechanism in the DBMS receives an access control policy, comprising at least one access control rule, for the structured document in the collection and generates for each of the paths associated with the nodes in the document a value expression. The value expression is an executable statement which describes the access control rule for that path, i.e., who is granted or denied access to data in that path. The value expression can be data-dependent if one of the at least one access control rules on which the value expression is based is data-dependent. If the value expression is not data-dependent, i.e., none of the at least one access control rules on which the value expression is based is data-dependent, then a result of an evaluation of the value expression is the same at all times for a given user, and the result can be stored in a cache.
During query execution when the user submits a request to access a node in the document, the DBMS evaluates a value expression for a path associated with the requested node, and if the value expression is not data-dependent, the result of the evaluation is stored in the cache. Thus, if the DBMS is required to check the path again during the session, e.g., the user submits a second query, the DBMS refers to the result in the cache instead of re-evaluating the value expression.
The preferred embodiment of the present invention presents several advantages over the conventional fine-grained access control process. The DBMS evaluates one value expression for a path instead of a plurality of access control rules making access control evaluation more efficient and faster. In addition, by storing the evaluation result of a non data-dependent value expression in cache, the number of evaluations the DBMS performs is minimized, further optimizing the access control evaluation process. Moreover, according to the preferred embodiment of the present invention, access control evaluation is performed during run time, as opposed to compile time, thereby allowing an administrator to change the access control policy at run time without having to recompile the query.
To describe further the present invention, please refer to
The server computer 104 uses a data store interface (not shown) for connecting to the data sources 106. The data store interface may be connected to a database management system (DBMS) 105, which supports access to the data store 106. The DBMS 105 can be a relational database management system (RDBMS), such as the DB2® system developed by IBM Corporation, or it also can be a native XML database system. The interface and DBMS 105 may be located at the server computer 104 or may be located on one or more separate machines. The data sources 106 may be geographically distributed.
The DBMS 105 and the instructions derived therefrom are all comprised of instructions which, when read and executed by the server computer 104 cause the server computer 104 to perform the steps necessary to implement and/or use the present invention. While the preferred embodiment of the present invention is implemented in the DB2® product offered by IBM Corporation, those skilled in the art will recognize that the present invention has application to any DBMS, whether or not the DBMS 105 is relational or native. Moreover, those skilled in the art will recognize that the exemplary environment illustrated in
According to the preferred embodiment of the present invention, the DBMS 105 includes a cache 800 and access control policies 107 that are authored by an administrator 108 or some other authorized personnel. Each access control policy 107 describes the security rules pertaining to data stored in the database. The DBMS 105 also comprises an Access Control mechanism 200 that provides path-level access control to structured documents stored on disk. Storing data “on disk” refers to storing data persistently, for example, in the data store 106. While the Access Control mechanism 200 is shown as a subcomponent of the DBMS 105, those skilled in the art recognize that the Access Control mechanism 200 can also be a separate module coupled to the DBMS 105. Such a configuration would fall within the scope of the present invention.
The access control policy 107 for the collection comprises at least one access control rule. Each access control rule typically defines a subject to which the rule applies, an action and a path. The subject can be a user's name or a group of users. The action can be, but is not limited to, a read, an update, a create or a delete action. The path identifies the node to which the rule applies. For example, the following access control rule:
</bib/book/title,{Murata},+read>
provides that the user Murata is allowed to read information at the title element node described by the path “/bib/book/title.” This access control rule is data-independent because user Murata is allowed such access for all documents in the collection. The access control rule can also include predicates such that a user's access to a particular node is data-dependent. For example, an access control rule that reads:
Grant read access on “/employee/salary” in collection “IMB_employees” to Group Manager if “/employee/status=4”
is data-dependent because in order to evaluate whether the user has access to the element node “salary,” the DBMS 105 must access the document itself, e.g., determine whether element node “status” is equal to four (4). The distinction between data-dependent and data-independent access control rules will become relevant in the discussion describing access control evaluation below.
Referring again to
The node tree would comprise the following element nodes:
The path to element node Q1 is:
/Questionnaire/Questions/Q1
Referring again to
</bib/book[@title=“security”], GROUP Admin,+read>
is transformed into:
(grant_read, ‘/bib/book’, equal($Group, Admin) & xpath(/bib/book[@title=“security”])
The above format is called a normalized rule. After this step is finished, a set of normalized access control rules is generated.
Next, in step 404, the translator 300 generates and populates a condition table.
Referring again to
The propagation rules above are illustrative and not exclusive. Those skilled in the art recognize that different propagation rules can be implemented, and that the method and system of the present invention is not limited thereby.
In step 406, for each modified normalized rule, the translator 300 applies the propagation rules and identifies each path that is affected by the modified normalized rule. For example, if the modified normalized rule associated with a node is a “GRANT read” to user A, the modified normalized rule propagates up and down from the node. User A has “read” access to all descendant nodes and to all ancestor element nodes. Thus, the translator 300 identifies the paths associated with the descendant and ancestor element nodes as being accessible by user A. If more than one modified normalized rule affects a particular path, those rules are combined for the path in step 406.
Next, in step 408, the translator 300 optimizes the rules by minimizing repeated value expressions in the output. In a preferred embodiment, reference notations and supplemental value expressions are used to optimize the rules. For the reference notation, the translator 300 compares the ConditionID specified for a path associated with a node to the ConditionID of the path associated with its parent node. If the ConditionIDs are identical, then the ConditionID in the child node is replaced with “ref(l, . . . /)” indicating that the condition for the child node is identical to the condition of the parent node and therefore, there is no need to reevaluate the value expression for the child. In the case of a sibling reference, “ref(2, . . . /sibling-node)” is used to express the reference while “. . . /sibling-node” is the relative path.
A supplemental value expression is an additional value expression associated with a path that describes the access rule for any descendant path that exists or may exist in the future. For example, if user B has read access on path “a/b,” then according to the propagation rules presented in Table 1, user B has read access to any descendant path associated with a node (descendant path/node) under “a/b.” A descendant path/node may exist (i.e., a path and value expression has been generated for the descendent node) or it may not yet exist because, for example, a document containing this path has not yet been added to the collection and processed by the Access Control mechanism 200. In this case, the translator 300 generates a supplemental value expression for path “a/b” that indicates that user B has read access to any descendent path/node. Thus, if and when a new descendant path/node is introduced, there is no need to generate a value expression for the new path.
Those skilled in the art will readily appreciate that utilizing reference notations and supplemental value expressions are but two ways to optimize the rules. Other techniques can be utilized to further optimize the rules.
After optimization, a value expression generator 302 transforms each modified normalized access control rule into a value expression for the path, via step 410. In a preferred embodiment, the value expression generator 302 performs a syntactical conversion and an addition of a “!p” notation. For the syntactical conversion, each condition expression is transformed into syntax defined as XPath step qualifier. For example, a condition expression:
equal($Group, Admin) & xpath(/bib/book[@year=20001])
is transformed into:
[$Group=‘Admin’ and @year=2000]for the path “/bib/book”.
For the “!p” notation, the value expression generator 304 generates “!p” notation wherever the reference notation is specified. For example, if ref(1, . . . /) is specified at “/bib/book/title”, then the value expression is extended to “if !p then [ref(1, . . . /)]”. When compiled and executed, the “!p” notation indicates to the DBMS 105 that the value expression for that path is the same as that for the path of the parent node and therefore, the DBMS 105 does not need to reevaluate the value expression if access to the parent has already been granted. After step 410 is finished, a value expression is generated for each path.
The above described value expression generation process can further be described through the following simple example. Suppose the following XML document is representative of a collection:
and the following rules make up the access control policy for the collection:
Here, rules 1-4 are transformed into normalized rules. Each normalized rule includes a head, a path expression and a condition. The normalized rules are as follows:
Next, the translator 300 generates and populates the following condition table:
The translator 300 then converts the normalized rules into the following modified normalized rules:
Here, each rule is propagated through each path in the node tree (generated by the path generator 202 in step 306 of
Optimization (step 408)
Here, reference notations are used to minimize repetitive outputs.
Value Expression Generation (step 410)
Here, each output statement is transformed into a value expression based on the condition expression in the condition table. A value expression is created for each path.
Referring again to
To describe further how the cache 800 and value expressions 604 are utilized by the DBMS 105 during access control evaluation at run time, please refer now to
According to a preferred embodiment of the present invention, during the execution of the query (during run time), the DBMS 105 performs an access control check to determine whether the user is authorized to access the requested node. The DBMS 105 does this by checking the cache 800 in step 704.
If the cache entry is a deny (D) statement (step 706), then the DBMS 105 denies access to the node in step 708. If the cache entry is a grant (G) statement (step 710), the DBMS 105 grants access to the node in step 718. If the cache entry is a data-dependent (DD) statement (step 714), the DBMS 105 accesses the path table 600 and evaluates the value expression 604 corresponding to the path 602 associated with the requested node, via step 716. If the result of the evaluation in step 716 is a grant, then the DBMS 105 grants access to the node in step 718, otherwise access is denied in step 719.
If the cache entry is an Unknown (U) statement (step 722) the DBMS 105 accesses the path table 600 and evaluates the value expression 604 corresponding to the path 602 associated with the requested node in step 726 via node B. Referring now to
After the DBMS 105 grants or denies access to the node in steps 708, 718 and 719, or after the DBMS 105 changes the cache entry from an Unknown (U) statement to a data-dependent (DD) or deny (D) or grant (G) statement in steps 736 and 738, the DBMS 105 receives a next query from the user in step 720 (
While the process described above utilizes the cache 800 in cooperation with the value expressions 604, those skilled in the art recognize that the cache 800 can also be utilized with the underlying access control rules. Thus, rather than evaluating the value expression 604, the DBMS 105 can evaluate the access control policy affecting the path and the result of that evaluation is stored as a cache entry 804 in the cache 800.
An improved method and system for performing path-level access control evaluation for a structured document in a collection has been disclosed. The preferred embodiment of the present invention presents several advantages over the conventional fine-grained access control process. The DBMS 105 evaluates one value expression 604 for a path 602 instead of a plurality of access control rules making access control evaluation more efficient and faster. In addition, by temporarily storing the evaluation result of a data-independent value expression in cache 800, the number of evaluations the DBMS 105 performs is minimized, further optimizing the access control evaluation process. Moreover, according to the preferred embodiment of the present invention, access control evaluation is performed during run time, as opposed to compile time, thereby allowing an administrator to change the access control policy at run time without having to recompile the query. In addition, by performing access control evaluation during run time, the DBMS 105 is able to hide data in a document.
Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments and those variations would be within the spirit and scope of the present invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
4907151 | Bartlett | Mar 1990 | A |
5193184 | Belsan et al. | Mar 1993 | A |
5283830 | Hinsley et al. | Feb 1994 | A |
5412807 | Moreland | May 1995 | A |
5561786 | Morse | Oct 1996 | A |
5644776 | DeRose et al. | Jul 1997 | A |
5652858 | Okada et al. | Jul 1997 | A |
5671403 | Shekita et al. | Sep 1997 | A |
5673334 | Nichani et al. | Sep 1997 | A |
5758361 | Van Hoff | May 1998 | A |
5787449 | Vulpe et al. | Jul 1998 | A |
5878415 | Olds | Mar 1999 | A |
5893086 | Schmuck et al. | Apr 1999 | A |
5920861 | Hall et al. | Jul 1999 | A |
5995952 | Kato | Nov 1999 | A |
6044373 | Gladney et al. | Mar 2000 | A |
6081810 | Rosenzweig et al. | Jun 2000 | A |
6085193 | Malkin et al. | Jul 2000 | A |
6101558 | Utsunomiya et al. | Aug 2000 | A |
6236996 | Bapat et al. | May 2001 | B1 |
6237099 | Kurokawa | May 2001 | B1 |
6249844 | Schloss et al. | Jun 2001 | B1 |
6308173 | Glasser et al. | Oct 2001 | B1 |
6334130 | Tada et al. | Dec 2001 | B1 |
6336114 | Garrison | Jan 2002 | B1 |
6366934 | Cheng et al. | Apr 2002 | B1 |
6381602 | Shoroff et al. | Apr 2002 | B1 |
6421656 | Cheng et al. | Jul 2002 | B1 |
6438576 | Huang et al. | Aug 2002 | B1 |
6457103 | Challenger et al. | Sep 2002 | B1 |
6480865 | Lee et al. | Nov 2002 | B1 |
6487566 | Sundaresan | Nov 2002 | B1 |
6502101 | Verprauskus et al. | Dec 2002 | B1 |
6519597 | Cheng et al. | Feb 2003 | B1 |
6584458 | Millett et al. | Jun 2003 | B1 |
6631371 | Lei et al. | Oct 2003 | B1 |
6658652 | Alexander et al. | Dec 2003 | B1 |
6798776 | Cheriton et al. | Sep 2004 | B1 |
6820082 | Cook et al. | Nov 2004 | B1 |
6836778 | Manikutty et al. | Dec 2004 | B2 |
6853992 | Igata | Feb 2005 | B2 |
6901410 | Marron et al. | May 2005 | B2 |
6922695 | Skufca et al. | Jul 2005 | B2 |
6938204 | Hind et al. | Aug 2005 | B1 |
6947945 | Carey et al. | Sep 2005 | B1 |
6959416 | Manning et al. | Oct 2005 | B2 |
7016915 | Shanmugasundaram et al. | Mar 2006 | B2 |
7031962 | Moses | Apr 2006 | B2 |
7043487 | Krishnamurthy et al. | May 2006 | B2 |
7353222 | Dodds et al. | Apr 2008 | B2 |
7478337 | Kodosky et al. | Jan 2009 | B2 |
7493603 | Fuh et al. | Feb 2009 | B2 |
7756857 | Wan | Jul 2010 | B2 |
7818666 | Dorsett, Jr. et al. | Oct 2010 | B2 |
20010018697 | Kunitake et al. | Aug 2001 | A1 |
20020038319 | Yahagi | Mar 2002 | A1 |
20020099715 | Jahnke et al. | Jul 2002 | A1 |
20020103829 | Manning et al. | Aug 2002 | A1 |
20020111965 | Kutter | Aug 2002 | A1 |
20020112224 | Cox | Aug 2002 | A1 |
20020123993 | Chau et al. | Sep 2002 | A1 |
20020133484 | Chau et al. | Sep 2002 | A1 |
20020156772 | Chau et al. | Oct 2002 | A1 |
20020156811 | Krupa | Oct 2002 | A1 |
20020169788 | Lee et al. | Nov 2002 | A1 |
20030014397 | Chau et al. | Jan 2003 | A1 |
20030028495 | Pallante | Feb 2003 | A1 |
20030204515 | Shadmon et al. | Oct 2003 | A1 |
20030208490 | Larrea et al. | Nov 2003 | A1 |
20040044959 | Shanmugasundaram et al. | Mar 2004 | A1 |
20040128615 | Carmel et al. | Jul 2004 | A1 |
20040193607 | Kudo et al. | Sep 2004 | A1 |
20040243553 | Bailey | Dec 2004 | A1 |
Number | Date | Country |
---|---|---|
0992873 | Apr 2000 | EP |
WO 03030031 | Apr 2003 | WO |
Number | Date | Country | |
---|---|---|---|
20050050010 A1 | Mar 2005 | US |