The present invention relates to techniques for processing updates to XML data, and, more particularly, to methods and apparatus for processing updates to XML data as queries.
It is often desired to rewrite an update as a query that returns the same data as would be produced by performing the update in place. Among other reasons, this is needed to define a view in terms of updates while avoiding the destructive impact of the updates on the source data. For example, consider an exemplary XML document T0 depicted in
A number of user groups may query the document T0 simultaneously, each with a different access-control policy that prevents disclosure of price information from suppliers of certain countries. To enforce the access control, each group is provided with a: security view that returns a document containing all the data from T0 that is not about the sensitive price information. These views should be virtual because it may be exceedingly costly to create and maintain a different (materialized) view for each user group. Unfortunately, such views are far from trivial to write by hand in, e.g., XQUERY, as the price information may appear at arbitrary depths in T0. In contrast, it is conceptually straightforward to “delete” the price data in a view, perhaps with a simple statement such as “delete //supplier [country=‘c1’ . . .
country=‘cn’]/price. Note that the intention is not to delete this data in the source; instead, it is merely to define the security view of a client with the update syntax, which is in turn rewritten into an equivalent query. Then, user queries posed on the view can be answered by composing the queries and the view and evaluating the composed queries directly on the original T0.
Another user may be concerned that a planned tariff will cause a 15% increase in the price of parts imported from a number of countries, and wants to find out the new costs of those parts affected by the changes. However, the user cannot update T0 in place before the new tariff policy takes effect. One way to achieve this update is by creating a separate copy of T0, updating the copy and then computing the costs by posing queries on the updated copy. A more efficient approach is to define a virtual view of T0 in terms of the updates by rewriting the updates into a view query, and thus avoid copying the entire T0. Then, one can compute the costs by composing queries with the view using the standard view querying methods, so that the composed queries can be evaluated against the original T0.
Another set of users may pose queries and updates on T0, while T0 may itself be actually a virtual document defined through data integration. In this case, there may be no sensible notion of performing an update on the virtual data; but one could still obtain a new document that would result from such an update on the document. Again, translating the update into a query and performing query composition will produce the desired result.
While a number of techniques have been proposed or suggested for rewriting updates into queries for relational databases (cf., S. Abiteboul et al., Foundations of Databases, Ch. 1 (Addison-Wesley, 1995)), computing complement queries becomes challenging for XML due to the nested nature of XML documents. A need therefore exists for methods and apparatus for rewriting updates as an equivalent query on XML data. That is, given an update u that needs to be applied to an XML document T to produce T′, the update u is rewritten as a query Quc, such that Quc(T)=T′. Thus, a (virtual) view can be defined directly in terms of update syntax.
Generally, methods and apparatus are provided for processing updates to an XML document. According to one aspect of the invention, updates are converted into one or more complement queries that can be performed on the XML document. The complement queries provided by the present invention allow (i) virtual views of XML data to be updated; (ii) updates and queries to be composed; and (iii) the XML document to be updated using an XML query engine. In one implementation, the XML document is recursively processed to determine for each node whether the node is affected by the update and implementing the update at the affected nodes.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
The present invention provides methods and apparatus for processing updates to XML data as queries on the data. According to one aspect of the invention, methods and apparatus are provided for rewriting of XML updates into queries. That is, given an update u over an XML document T, a query Quc, referred to as a complement query of u, is derived such that Quc(T) returns the same document as would be produced by updating T in place with u. Thus, one can define a (virtual) view in terms of updates while avoiding the destructive impact of updates. Furthermore, queries can be directly composed with updates. The need for this is evident in, e.g., XML security, integration and update testing. A number of alternative algorithms are provided for computing complement queries from a class of XML updates commonly found in practice. Algorithms are disclosed for computing a single complement query from a sequence of updates, based on incremental computation. Complement queries computed in accordance with the present invention can be evaluated in time linear in the size of the XML document.
Among other benefits, it is easier to define certain views with updates than writing directly in, e.g., XQUERY. More importantly, other queries can be composed with the update (in its query or view form) by leveraging query composition techniques. Quc is referred to as a complement query of u.
According to another aspect of the invention, updates can be rewritten using a naive approach to rewriting a class of XML updates into complement queries in XQUERY. Defined in terms of XPATH, the disclosed update language is the core of many known update languages, and can express many updates commonly found in practice. The naive algorithm produces complement queries that are efficient when only a small fraction of the document is touched by u.
According to yet another aspect of the invention, a more optimized approach is presented for expressing Quc in XQUERY. Generally, this top-down approach yields a query Quc that processes u via a single top-down traversal of the input XML tree T, identifying the nodes to be updated based on a notion of selecting non-deterministic finite state automata (NFA) and a function checkp( ) that checks the satisfaction of XPATH qualifiers in u involved at each node encountered.
Another aspect of the invention provides a bottom-up technique for implementing checkp( ) of Quc that evaluates all the XPATH qualifiers in u via a single bottom-up traversal of T, in case that the query processor does not handle complex qualifiers well. Thus, the evaluation of Quc requires at most two passes of T: a bottom-up pass for evaluating qualifiers followed by a top-down pass for selecting nodes to be updated.
In addition, another aspect of the invention produces a complement query Q{right arrow over (u)}c for a sequence of updates {right arrow over (u)}=u1, . . . , uk over a document T. This is required for, e.g., defining a view in terms of a sequence of updates, and it allows the cost of processing a complement query to be amortized over a sequence {right arrow over (u)} of updates. It is shown that the sequence {right arrow over (u)} of updates can be batched into a single complementary query Q{right arrow over (u)}c such that Q{right arrow over (u)}c(T)=uk( . . . (u1(T) . . . ). An algorithm is also provided to compute Q{right arrow over (u)}c that handles {right arrow over (u)}based on incremental computation. Such a complement query combines the evaluation XPATH qualifiers in {right arrow over (u)} via a single pass of T. Then, while processing updates in {right arrow over (u)} one by one, for each update Q{right arrow over (u)}c only inspects qualifiers associated with the portion of data changed by previous updates in {right arrow over (u)}, instead of conducting two passes of the entire T for each update.
The disclosed techniques for rewriting XML updates into complement queries have several salient features. First, complement queries Quc produced by the present invention (for a single update and a sequence of updates) have a linear-time data complexity that is the best one can expect since it is the lower bound for evaluating XPATH queries embedded in u alone. In addition, the algorithms accommodate referential transparency (side-effect free) of XQUERY and can be readily coded in XQUERY. Further, the disclosed techniques provide the ability to define (virtual) views in terms of updates and to compose queries with updates without side effects on the source data. In addition, the disclosed techniques suggest techniques potentially useful for implementing XML updates.
It is noted that complement queries are evaluated on top of an XML query processor at the source level, and thus it is unreasonable to expect that an implementation of updates via complement queries outperforms direct implementation of updates in an XML query processor. As a byproduct, however, the present invention yields a convenient approach to supporting XML update functionality when update support is not available on a particular platform. For XML data stored as a file in a file system, the lower bound of time required to update a document is linear in the size of the data (for uploading the data from and re-serializing out to the file system), which is comparable with the efficiency of complement queries produced by the present algorithms. Furthermore, translating updates to queries allows a uniform optimizer to be used for both queries and updates.
XML Updates
As the standard language for XML updates is not yet available, a class of updates is considered that is supported by most proposals for XML update languages. This class of updates is defined in terms of XPATH (J. Clark and S. DeRose, XML Path Language (XPath), W3C Working Draft (November 1999)).
1. XPath
The exemplary embodiments of the present invention use core XPATH (G. Gottlob et al., “Efficient Algorithms for Processing XPath Queries,” VLDB (2002)) with downward modality. This class of queries, referred to as X, is defined by:
p::=ε|l|*|p/p|p//p|p[q],
q::=p|p=‘s’|label( )=l|qˆq|q q|q,
where ε, l and * denote the empty path, a label (tag) and a wildcard, ‘u’, ‘/’ and ‘//’ stand for union, child-axis and descendant-or -self-axis, respectively; and q in p[q] is called a qualifier, in which s is a constant (string value), and ‘ˆ’, ‘’ and ‘
” denote conjunction, disjunction and negation, respectively. For //, p1/ //p2 is abbreviated as p1//p2.
An XPATH query p is evaluated at a context node v in an XML tree T, and its result is the set of nodes of T reachable via p from v, denoted by v∥p∥.
2. XML Updates
With the class X of XPATH expressions, an XML update language is defined, denoted by U, using the syntax of P. Lehti, “Design and Implementation of a Data Manipulation Processor for an XML Query Processor,” Technical Report, Technical University of Darrnstadt, Diplomarbeit (2001). The language supports four operations:
Generally, given an XML tree T with root r, the insert operation finds all the elements reachable from r via p in T, and adds the new element e given by const-expr as the last child of each of those elements. More specifically, (1) it computes r∥p∥; (2) for each element v in r∥p∥, it adds a as the rightmost child of v.
Similarly, the delete operation first computes r∥p∥ and then removes all the nodes in r∥p∥ (along with their subtrees) from T. The replace operation computes r∥p∥ and then replaces each v in r∥p∥ with e defined by const-expr. Finally, the rename operation computes r∥p∥ and for each v in r∥p∥, changes the label of v to s. The new tree obtained by an update u is denoted as u(T).
Referring to the XML tree T0 of
(1) insert e into p1, where p1 is X expression //part[pname=‘keyboard’] //part[supplier/sname=‘HP’ ˆ
supplier/price<15]; this is to first find every keyboard in T0, and then for each of its subparts that is supplied neither by HP nor at a price lower than $15 by any supplier, add e as a supplier;
(2) delete p2, where p2 is //part[pname=‘keyboard’]/subpart//supplier[sname=‘HP’ ˆ
price<15]; this is to remove from T0 the suppliers of all subparts of any keyboard except for supplier HP and those suppliers selling at a price lower than $15;
(3) replace p3 with e, where p3 is //part[pname=‘keyboard’]/supplier[sname=‘Compaq’ ] this is to substitute e for the supplier Compaq of any keyboard;
(4) rename//country as address changes the label country to address for every country in T0.
Each operation may incur multiple changes at an arbitrary depth of T0, since the same part element may occur at different places of T0, due to the subpart hierarchy.
Computing Complement Queries
Three techniques are presented that, given an XML update u in the language U, compute a query Quc in XQUERY such that Quc(T)=u(T) for any XML document T. Quc is referred to as a complement query of u.
The first technique, referred to as the Naive Method, consists of a set of query templates in XQUERY. For an update u in U, one of these templates may be instantiated to form a complement query Quc. These templates demonstrate the feasibility of finding complement queries for XML updates. This method, however, may not work well when the set of nodes changed by the update is large.
The second technique, referred to as the Top Down Method, uses recursive XQUERY functions, and simulates the evaluation of an automaton on the (paths of) the tree. Combined with optimization techniques to be introduced in the next section, complement queries produced by this method are guaranteed to take at most linear time in the size of the document.
1. Naive Method
For any update u in U, one can construct a complement query Quc. To illustrate this, consider u=insert const-expr into p over a document T, where const-expr evaluates to an XML element, and p is an XPATH query. The update u can be rewritten into Quc in XQUERY, as shown in
Since doc(T)/p and const-expr in this template can be instantiated with arbitrary XQUERY expressions (not just queries in X or constant expressions), it is shown that for a wide variety of updates one can find a complement query. However, these queries are inefficient when the scope of the update is broad (i.e., when p is not very selective and |$xp| is large): in the worst case it takes quadratic time in the size of T, i.e., in O(|T|2) time unless the XQUERY engine optimizes the test nε$xp.
2. Restricted Top Down Method
A Restricted Top-Down Method is shown in
3. General Top Down Method
The disclosed top-down method, given an update u, produces a complement query Quc with linear asymptotic behavior, based on a notion of selecting NFA. Generally, for the X query p in u, the selecting NFA of p, denoted by Mp, is generated, which is a mild extension of NFA and is used for identifying nodes in r∥p∥. The query Quc maintains a set S of (current) states in Mp as it traverses the XML tree T top-down. For each encountered node n in T, n's label is used to change S to S′ according to the function nextStates( ) shown in
A. Constructing Mp
The selecting NFA Mp of an X query p is defined as follows. Observe that p=β1[q1]/ . . . /βk[qk], where βi is either label 1, wildcard * or descendant //. Mp=(K, Γ, δ, s, f), where (1) the set K of states consists of the start state s=(so, [true]), and for each iε[1, k], a state (si, [qi]) denoting the step βi with the qualifier [qi], where the final state f is(sk, [qk]); (2) the alphabet ν consists of all the labels in p and the special wildcard *; (3) the transition function δ is defined as follows: for each i in [0, k−1], δ((si, [qi]), βi+1)=(si+1, [qi+1]) if βi+1 is a label or *, and δ((si, [qi]), ε)=(si+1, [qi+1]) and δ((si, [qi]),*)=(si, [qi]) IF βi+1 is //.
Recall the X query p1 given above. The selecting NFA for p1 is depicted in supplier/sname=‘HP’ˆ
supplier/price<15].
A selecting NFA Mp has the following notable features. First, Mp has a semi-linear structure: the only cycles in Mp are self-cycles labeled * and introduced by //. Note that from any state (si, [qi]) at most two states can be reached via the δ function. Second, while Mp is based on the “selecting path” of p, it incorporates its qualifiers into the states, which, as discussed below, is effective in pruning unaffected subtrees. Third, Mp can be constructed in O(|p|2) time, and its size is bounded by O(|p|).
B. Next States
The function nextStates( ), shown in
Note that, to cope with the E transitions in the NFA Mp, the ε-closure of S′ must be computed (line 4), which is the set of all the states reachable from any state of S′ via one or more ε transitions in Mp. The ε-closure of S′ can be computed in O(|p|) time. Also, by the construction of selecting NFAs given earlier, if δ ((s, [q]), *) (or δ ((s, [q]), fn:local-name(n))) is defined, then it maps to a single state rather than a set. Thus, the cardinality of S′ when computed by repeated calls to nextStates( ) is bounded by O(|p|).
C. Top Down Method
The General Top Down Method is illustrated for an update u=insert const−expr into p. This is described by the algorithm topDown given in
Recall that u equals insert c into p1 in the above example. Given the root of the XML tree T0 of
Observe the following about topDown. First, it can be readily realized in a way that incurs no side effects and thus yields a complement query Quc in XQUERY. Second, if checkp( ) takes constant time, then for any update u on an XML tree T, Quc takes at most O(|T∥p|) time, where p is the X query in u. That is, it takes time linear in |T|. A technique is presented to achieve this in the next section. Third, the use of selecting NFA allows us to simply return unchanged subtrees without further recursive processing.
Handling Expensive Qualifiers in One Pass
In this section, an algorithm, bottomUp, is presented that implements checkp( ) used in the TopDown method of the previous section. Taken together with algorithm topDown, algorithm bottomUp produces a complementary query Quc for any uεU such that Quc, is guaranteed to execute in time linear in the size of the document, including the cost of implementing checkp( ). This algorithm may be implemented inside an XQUERY processor, or in XQUERY itself in the spirit of the rewriting of topDown. Practically, if complex qualifiers are handled well by the processor, the bottomUp algorithm is not necessary. However, (1) not all processors handle complex qualifiers efficiently; (2) it is possible to use bottomUp for only those qualifiers that are known to be handled poorly; and (3) novel techniques will be introduced in the next section to efficiently handle sequences of updates, and these techniques extend bottom Up.
Generally, given an update u over an XML tree T, bottom Up evaluates all the qualifiers in the XPATH expression p in u via a single bottom-up traversal of T, and annotates nodes of T with the truth values of related qualifiers. Given the annotations, at each node checkp( ) takes constant time to check the satisfaction of a qualifier at the node. This exemplary implementation of checkp( ) is at the cost of executing bottomUp before topDown. BottomUp executes in linear time in |T|, and thus it does not increase the overall data complexity bound.
1. Evaluating Qualifiers
A. Qualifiers and Sub-Qualifiers
In the following algorithm, a list of qualifiers Q is processed that includes not only all the qualifiers appearing in p, but also all sub-expressions of these qualifiers. Furthermore, Q is topologically sorted such that for any expression e in Q, if s is a sub-expression of e, s appears before e in Q. To simplify the presentation, a “normalized” form of X qualifiers is adopted such that each path p in a qualifier is of the form ρ/p′ where ρ is one of *, // or ε[q], and p′ is a path. This normalization can be achieved by using the following rewriting rules: (1) l to */ε[label( )=l]; (2) p[q] to p/ε[q]; (3) p[q1] . . . [qn] to p[q]where q=q1ˆ . . . ˆqn; and (4)p=‘s’ to p[ε=‘s’]. The normalization process takes at most O(|p|2)time.
For the X query p1 given above, the list Q contains the expressions q3=[ε=‘keyboard’], q1=[pname[q3]], q6=[ε=‘HP’], q5=[sname[q6]], q4=[sup plier[q5]], q9=[ε<15], q8=[price[q9]], q7=[sup plier[q8]] and q2=[q4ˆ
q7]. Note that all expressions are in the normal form mentioned above, and sub-expressions appear before their containing expression.
B. Dynamic Programming
An important step of bottomUp is the evaluation of qualifiers. It is done based on dynamic programming, as follows. Assume that the truth values of all the qualifiers q in Q are already known for (1) the immediate children of n (denoted by csatn(q)), and (2) for all the descendants of n excluding n (csatn(q)). Then, in order to compute the satisfaction of the qualifiers at n, denoted by satn(q), it suffices to do a constant amount of work per qualifier, as summarized in function QualDP( ) in
It is noted that care is needed for this recursion to work when computing satn (q) at the leaves n of the tree. To do this, csat ⊥ (q) (resp. dsat ⊥ (q)) is defined such that it is false when q ranges over expressions of the form */p; otherwise it is computed in the same way as in QualDP( ).
The truth values for all qualifiers in Q can be computed in time O(|Q|) at any node in a tree T.
C. Filtering NFA
Another important issue for bottom Up is to determine the list Q of qualifiers to be evaluated at each node of T. To do this, a notion of filtering NFA is introduced. Given an X expression p, a NFA is constructed, referred to as the filtering NFA of p and denoted by Mf, which is an extension of selecting NFAs used in top Down. Generally, Mf is built on both the selecting path and the qualifiers of p, stripping off the logical connectives in the qualifiers; the states of Mf are also annotated with corresponding qualifiers. Mf is used to keep track of whether a node n is possibly involved in the node selecting of p and what qualifiers are needed at n. Filtering automata are illustrated with the following example instead of giving its long yet simple definition (which is similar to its selecting NFA counterpart).
The filtering NFA for the query p1 of the above example is depicted in
For a set S of states of a filtering NFA Mf, Q(S) denotes the list of all qualifiers appearing in the states of S, along with their sub-expressions, properly ordered with sub-expressions preceding their containing expressions.
The size of the filtering NFA Mf for an X query p is in O(|p|), since only a constant amount of information needs to be stored about each expression (as in a parse tree).
2. Bottom Up Computation of Qualifiers
Another aspect of the invention provides an overall algorithm for computing qualifiers of an X expression p via a single bottom-up traversal of an XML tree T.
The algorithm, bottomUp, is shown in
To compute satn(q) the algorithm associates two vectors of boolean values with n:
These vectors have the following properties. Assume that nc, and ns are the left-most child and the immediate right sibling of n, respectively. Then, for qεQ, rsatn
Taken together, the algorithm bottomUp first computes the set S′ of Mf states reached from S by inspecting the label of n and the transition function δ of Mf (lines 1-2). These steps mirror nextStates( ), but omit the checking of qualifiers. Next, bottomUp calls itself recursively on its right sibling (line 3) and left-most child (line 8), which returns the children list L, and the list of right siblings Ls. It uses QualDP( ) to compute satn, (line 13). Finally, bottomUp returns a list (lines 14-21) with an element n′ as the head, which has the same label as n, carries children Lc and is annotated with satn, rsatn(q) and rdsatn(q); the tail of the list is the right-sibling list Ls.
In order to cope with the referential transparency (side-effect free) of XQUERY, the bottom-up traversal of the XML tree is simulated by recursively invoking bottom Up at the left-most child and the immediate right sibling of n, if any; in this way each node is visited at most once. Observe that the emptiness check of S′ (lines 6) allows avoiding recursively processing the subtrees that will contribute neither to the node-selecting path of p nor to the qualifiers needed in the node selecting decision. That is, only if S′ is not empty, bottomUp are invoked at the children of n and QualDP( ) is called.
The combined complexity of bottomUp is O(|T∥p|2) and its data complexity is linear in |T|. In practice, |p| is often small.
Consider again p1 of the above example. Given the root of the document T0 of
As another example, given p′=supplier//part and the root r of T0, bottomUp returns T0 right after checking the immediate children of r, since the filtering NFA for p′ reaches no state from r, which has no supplier children.
A. Combining bottomUp with topDown
Putting bottomUp and topDown together, provides a complement query for XML updates in U. For example, a complement query Quc for insert operations u is shown in
B. Properties
The complement query Quc has several salient features. First, it is optimal: the entire computation of Quc(T) can be done with two passes of T, which are necessary for evaluating the embedded XPATH query p alone. Second, Quc can be readily coded in XQUERY. Indeed, the list Q and the NFAs can be coded in XML, sat, rsat and rdsat can be treated as XML attributes, and assignment statements can be easily replaced with side-effect free function calls. BottomUp and topDown are recursive functions to simplify the discussion and to facilitate their encoding in XQUERY. Finally, as noted above, the overhead of bottomUp is not required for simple qualifiers. This can be easily accommodated by the present algorithm by using checkp( ) from the last section for qualifiers that can be determined efficiently in the native processor, and removing such qualifiers from p before computing Mf in line 1 of
Alternatively, if integrated with an XQUERY processor, the computation of bottomUp can be combined with the loading of the document, and topDown can be integrated with the output of the new document. This also suggests an approach to implementing XML updates with two passes of the XML document in the entire computation.
C. Static Analysis of XML Updates
The analysis of XML updates at compile time might seem to speed up the performance. For example, given u=insert e into p, if the XPATH expression p is not satisfiable, then u can be simply rejected without being evaluated. This may help in certain simple cases, but unfortunately, not much in general. This is because it involves the satisfiability analysis of XPATH queries, i.e., the problem to determine, given an XPATH query p, whether or not there is any XML document T (with root r) such that r|p| is nonempty. The analysis is currently generally too expensive to be practical: it is EXPTIME-hard for X, and is already PSPACE-hard for a subset of X without “//” and disjunction.
Complement Query of Multiple Updates
The problem of processing a sequence of XML updates is now addressed: given {right arrow over (u)}=u1, . . . , uk, where ui is an update defined in U, the task is to find a single complementary query Q{right arrow over (u)}c such that Q{right arrow over (u)}c(T)=uk( . . . (u1(T) . . . ) for any XML tree T. As observed above, this is important for defining a (virtual) XML view in terms of a sequence of updates, among other things. In response to this, it is shown that it is always possible to find such a Q{right arrow over (u)}c by presenting a naive Nested Query Method. Another method is then presented for computing more efficient Q{right arrow over (u)}c based on incremental computation techniques.
1. Nested Query Method
A single complementary query Q{right arrow over (u)}c can be computed for a sequence {right arrow over (u)}=u1, . . . , uk of updates by leveraging the composability of XQUERY and the rewriting algorithms given in the last section, as follows: (1) compute a complement query Qu
The query template of
2. Incremental Approach
A. Multiple Updates
Assume that the X expression embedded in ui is pi, and that the input XML tree is T. The key idea of the algorithm multiUpdate is to (1) evaluate the qualifiers in all pi's via a single bottom-up traversal of T; that is, the evaluation of all the qualifiers are combined and conduct it in a single pass of the tree; (2) process each update ui for iε[1, K] via a top-down traversal of the tree; (3) when each ui is performed, incrementally update the qualifiers of pj for j>i rather than recomputing them starting from scratch. The incremental computation is conducted on only those nodes affected by the update ui, i.e., either the new nodes inserted into T and/or certain nodes on a path from the root to the nodes inserted/deleted/renamed by ui, instead of over the entire tree. The rationale is that ui typically only incurs small changes to the tree and thus only the updated parts need to be checked. This motivates us to utilize incremental technique to minimize unnecessary recomputation of qualifiers in a sequence of XML updates.
B. Bottom Up Processing
Given a node n in an XML tree T, the function combinedBU evaluates the qualifiers of p1, . . . , pk at n and its descendants, via a bottom-up traversal of the subtree rooted at n. It returns the annotated XML tree T′ in which each node n is associated with satn(q), rsatn(q) and rdsatn(q). The details are omitted, as it is a mild extension of the bottomUp function given in
Note that combinedBU evaluates all the qualifiers in p1, . . . , pk, in a single pass of T rather than k passes. Furthermore, common qualifiers in these XPATH expressions are evaluated only once.
Consider a sequence {right arrow over (u)}0=u1, u2, u3, where u1, u2, u3 are the insert, delete and rename operations given in 1), 2) and 4) of the above example, directed to a supplier element, respectively. Given {right arrow over (u)}o and the XML tree T0 of
C. One Sweep: Combining Top-Down and Bottom-Up Processing
The function sweep, given in
The processing of ui is conducted via a traversal of ST similar to the algorithm bottom Up of
Once inserts and siblings have been handled, the set S′ of the Mp states reached at n is computed by calling the nextStates( ) function given in
If either no final state is reached or a rename is required, S′ is checked to see if it is empty (line 14), in which case the children of n can be directly used without a call to sweep (line 15), effectively pruning the search space. Otherwise the children of n are processed recursively (line 17). The rename is handled right immediately after the recursive call (lines 19-22) by replacing n with a copy of n bearing the new label.
The qualifiers at n are re-evaluated (line 25) only if either renaming has taken place, or rsat or rdsat has changed at n's children (line 23). Moreover, sweep compares rsat and rdsat at os (lines 2 and 4) and ns (line 26), the old and new right siblings respectively, to see if its rsat or rdsat is changed (line 27). The values rsat and rdsat are recomputed at n (line 28) along the same lines as bottomUp of
Finally, sweep returns a list in which the head is ui (ST) with sat, rsat, rdsat incrementally evaluated, and the tail is the already-processed right-sibling list L, (lines 29-30).
Recall the updates {right arrow over (u)}o=u1, u2, u3 given in the above example. To handle {right arrow over (u)}o over T0 of
D. Complexity
Function sweep for update ui, takes at most O(|ui∥Ti|+(|pi+1|+ . . . |pk|)Ti+1|) time. Hence, the data complexity of the algorithm multiUpdate is linear in the size of the trees. When the changes incurred by updates are small, as commonly found in practice, multiUpdate outperforms the complement-query of
E. Discussion
Algorithms multiUpdate, combinedBU and sweep accommodate referential transparency and thus can be readily coded in XQUERY. These yield a single complement query QC in XQUERY with a linear-time data complexity for a sequence u. In addition, first, it minimizes unnecessary recomputation as just discussed. Second, the check of empty state set (line 14, sweep) avoids unnecessary processing of subtrees that are not affected by the update. Third, the incremental computation is combined with the process of the update ui, instead of starting a separate bottom-up pass from scratch. Thus, the entire process of ui is done in a single pass visiting each node at most once.
Given a sequence {right arrow over (u)}=u1, . . . , uk, it is possible that an update ui may cancel the effect of a previous update uj(<i). For example, consider insert e into p followed by delete p′. If the XPATH expression p is contained in p′, i.e., any node reachable via p is also reachable via p′, then there is no need to execute the insert operation at all. This suggests that the containment problem for XPATH be considered, i.e., the problem to determine, given two XPATH expressions p and p′, whether or not for any XML tree T with root r, r∥p∥≦r∥p′∥. Unfortunately, the containment analysis may be impractical: it is EXPTIME-hard for X.
F. An Update Syntax for Defining Views
The ability to compute a complement query Q{right arrow over (u)}c from a sequence {right arrow over (u)} of updates suggests the following syntax for defining a view:
Given an XML tree T, the value of $x is the tree computed by Q{right arrow over (u)}c (Q(T), where {right arrow over (u)}=u1, . . . , un. In terms of this update syntax one can define a security view from an integration view Q, as indicated above. In addition, this allows a seamless combination of queries and updates since $x can appear any place in a query where an XQUERY expression is allowed. Moreover, there are optimization techniques for combining the evaluation of Q with that of Qc, as would be apparent to a person of ordinary skill.
System and Article of Manufacture Details
As is known in the art, the methods and apparatus discussed herein may be distributed as an article of manufacture that itself comprises a computer readable medium having computer readable code means embodied thereon. The computer readable program code means is operable, in conjunction with a computer system, to carry out all or some of the steps to perform the methods or create the apparatuses discussed herein. The computer readable medium may be a recordable medium (e.g., floppy disks, hard drives, compact disks, or memory cards) or may be a transmission medium (e.g., a network comprising fiber-optics, the world-wide web, cables, or a wireless channel using time-division multiple access, code-division multiple access, or other radio-frequency channel). Any medium known or developed that can store information suitable for use with a computer system may be used. The computer-readable code means is any mechanism for allowing a computer to read instructions and data, such as magnetic variations on a magnetic media or height variations on the surface of a compact disk.
The computer systems and servers described herein each contain a memory that will configure associated processors to implement the methods, steps, and functions disclosed herein. The memories could be distributed or local and the processors could be distributed or singular. The memories could be implemented as an electrical, magnetic or optical memory, or any combination of these or other types of storage devices. Moreover, the term “memory” should be construed broadly enough to encompass any information able to be read from or written to an address in the addressable space accessed by an associated processor. With this definition, information on a network is still within a memory because the associated processor can retrieve the information from the network.
It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.