This invention is directed to addressing an element in a document written in a language such as XML (Extensible Markup Language) or HTML (Hypertext Markup Language). It is more particularly directed to updating a designation expression for an element when an document is modified.
Structured documents written in XML, HTML, or other languages used for data exchanges over networks such as the Internet (referred to as structured documents hereafter) may have meta information, such as annotations, that addresses particular elements in the structured documents. The structured documents may also have modification rules written in the documents in advance, under which the documents are modified. To add these meta information and modification rules to the structured documents, XPath (XML Path Language) is often used to address particular positions in the structured documents so that external documents are referred to.
XPath is a language for addressing particular parts of a structured document. Using XPath as addressing information allows arbitrarily specifying those positions in the structured document to which annotations are added or modifications are made. In the subsequent description, data written in XPath will also be simply referred to as an XPath.
Specifically, XPath is written in the following manner.
As described above, XPath allows arbitrarily addressing particular elements in a structured document such as an XML or HTML/XML document. However, if the structured document subjected to designation is modified, elements or their positions in the document change. Therefore, the position designation in XPath may get out of order, and desired elements may not be properly addressed.
Conventionally, to keep the desired elements properly addressed in the structured document in this case, XPath descriptions have to be modified manually. This requires significant efforts and imposes a heavy burden on a developer of a system involving this structured document.
Thus, an aspect of the invention is to keep a desired element properly addressed in a structured document in which particular elements are addressed, even if the structured document is modified.
Another aspect of the invention, is to provide means for automatically updating an XPath addressing a particular element in a structured document based on a modification made to the structured document if the structured document is modified.
An example embodiment of the invention to achieve the above objects is implemented as a data processing method for addressing an predetermined element or sets of elements in a structured document. The data processing method comprises the steps of: when a structured document having an element addressed by predetermined addressing information is modified, inputting the structured document to analyze a modification; and updating the addressing information according to the analyzed modification made to the structured document so that the addressing information addresses a corresponding element or corresponding elements in the modified structured document.
In an alternate embodiment of the invention to achieve the above aspects is also implemented as an addressing information generation system for performing such data processing. The addressing information generation system comprises: a difference computation unit for computing a difference between structured documents; and an addressing information generation unit for generating addressing information from addressing information that addresses a part of a particular structured document based on information on the difference computed by the difference computation unit, the generated addressing information addressing a corresponding part of the other structured document.
These and other aspects, features, and advantages of the present invention will become apparent upon further consideration of the following detailed description of the invention when read in conjunction with the drawing figures, in which:
The present invention provides methods, apparatus and systems to keep a desired element properly addressed in a structured document in which particular elements are addressed, even if the structured document is modified. The invention also provides means for automatically updating an XPath addressing a particular element in a structured document based on a modification made to the structured document if the structured document is modified.
An example of a method of the invention, is implemented as a data processing method for addressing predetermined element or sets of elements in a structured document. The method includes the steps of: when a structured document having an element addressed by predetermined addressing information is modified, inputting the structured document to analyze a modification; and updating the addressing information according to the analyzed modification made to the structured document so that the addressing information addresses a corresponding element or corresponding elements in the modified structured document.
Specifically, the step of analyzing a modification made to the structured document comprises: converting an unmodified version and a modified version of the structured document into tree-structured data items; and computing a difference between the tree-structured data items. The addressing information is updated based on the difference between the tree-structured data items.
More specifically, the processing of computing the difference between the tree-structured data items is performed to track a component of the tree-structured data items that is moved in operations required for transformation between the tree-structured data items transformed from one to the other according to modification of the structured document.
Preferably, an XPath may be used as the addressing information for addressing the element in the structured document.
Then, updating the addressing information comprises updating an XPath describing the addressing information by regenerating LocationSteps forming the XPath based on the difference between the unmodified version and the modified version of the structured document.
The invention to achieve the above objects is also implemented as an addressing information generation system for performing such data processing. The addressing information generation system comprises: a difference computation unit for computing a difference between structured documents; and an addressing information generation unit for generating addressing information from addressing information that addresses a part of a particular structured document based on information on the difference computed by the difference computation unit, the generated addressing information addressing a corresponding part of the other structured document.
More preferably, the addressing information generation system further comprises a document analysis unit for analyzing structures of the structured documents and converting the structures into tree-structured data items, wherein the difference computation unit computes the difference by comparing the tree-structured data items corresponding to the structured documents converted by the document analysis unit.
The invention to achieve the above objects may also be implemented as a method for computing a difference between at least two tree-structured data items. The method comprises the steps of: reading at least two tree-structured data items to be processed from memory to compare the at least two tree-structured data items and creating an operation sequence, in which each operation for transforming one of the tree-structured data items into the other tree-structured data item is expressed as a combination of predetermined operations on a component of a tree-structure; and changing operations in the operation sequence that are interpreted as a movement of a component into an operation of moving the component.
The components of the tree-structures include nodes and subtrees of the trees. The combination of predetermined operations on a component of the tree-structure is a combination of basic operations such as inserting, removing, and modifying the component.
More specifically, the step of changing the operation sequence in the list comprises adding an operation of moving a component of the tree-structured data items to the operation sequences in place of a pair of operations of removing and inserting the component in the operation sequences.
The step further comprises replacing, based on a predetermined rule, an operation of modifying a component of the tree-structured data items in the operation sequences with a different operation that involves moving the component.
The invention to achieve the above objects is also implemented as an annotation server for managing annotation data made for an HTML/XML document. The annotation server comprises: difference computation means for computing, when the HTML/XML document for which the annotation data has been made is modified, a difference between an unmodified version and a modified version of the HTML/XML document; and XPath update means for updating, based on difference information obtained from computation by the difference computation means, an XPath associating the annotation data with a part of the HTML/XML document.
The invention to achieve the above objects is also implemented as a program for controlling a computer so that the computer performs processing corresponding to the steps of the data processing method or the method for computing a difference described above, or the invention is also implemented as a program for causing a computer to function as the system for updating addressing information or the annotation server described above. The program may be stored in and distributed as a magnetic disk, optical disk, semiconductor memory, or other storage media, or distributed through a network.
Now, the invention will be described in detail below based on an embodiment illustrated in the appended drawings.
The computer shown in
It is noted that
As shown in
These components are virtual software blocks provided by a program that is deployed in the main memory 103 shown in
The structured documents and the XPath to be processed are stored in a predetermined area, for example, an area in the hard disk 105, and read by the CPU 101 for XPath update processing according to this embodiment.
In this embodiment shown in
The difference computation unit 20 computes differences between the trees of the unmodified and modified structured documents converted by the document analysis unit 10. As a result, the details of the modifications made to the structured document to be processed are recognized. This embodiment proposes a novel method for computing the differences suitable for the XPath update to be performed later. Now, a difference computation algorithm of this method will be described below.
As background knowledge, a conventional difference computation algorithm generally used will be described. While various algorithms have been proposed for computing differences between two trees, a typical difference computation algorithm is the one that computes a minimum-cost operation sequence.
As shown in
In this case, the algorithm for computing a minimum-cost operation sequence computes to determine that the transformation of the tree 161 into the tree 162 requires operations of modifying the content of the node b into the content of the node d (Modify (b→d)) and modifying the content of the node d into the content of the node b (Modify (d→b)). This is because these operations enable the tree 161 to be transformed into the tree 162 at the minimum processing cost of 2 according to the above mentioned processing cost value.
As shown in
In this case, the algorithm for computing a minimum-cost operation sequence computes to determine that transformation of the tree 171 into the tree 172 requires operations of removing the node b from the tree 171 (RemoveNode (b)) and inserting the node b into a position shown in the tree 172 (InsertNode (b)). Again, this is because these operations enable the tree 171 to be transformed into the tree 172 at the minimum processing cost 2 according to the above mentioned processing cost value.
However, this algorithm for computing a minimum-cost operation sequence is not suitable for this embodiment, because the aim of the difference computation in this embodiment is the automatic XPath update. In the example of
Similarly, in the example of
The above discussion also applies to MoveTree, an operation of moving a subtree (partial tree structure within a tree). It should be understood that although the subsequent description addresses only the processing of nodes for simplicity, MoveTree may be similarly analyzed.
Based on the above discussion, a description will be given of the difference computation algorithm executed by the difference computation unit 20 and suitable for this embodiment. The difference computation algorithm used in this embodiment is designed to track objects (nodes and subtrees) that have been moved due to modification of a tree.
The difference computation unit 20 receives inputs of the tree T corresponding to the unmodified document P and the tree T′ corresponding to the modified document P′ from memory means such as the main memory 103, where the trees have been temporarily stored. Then, it analyzes operations required for modifying the tree T into the tree T′ in terms of the basic operations, RemoveNode, InsertNode, and Modify, and generates a list L of obtained operation sequences. The analysis may be performed using a conventional technique, such as the above described algorithm for computing a minimum-cost operation sequence. The generated list L of the operation sequences is temporarily stored in memory means such as the main memory 103. Then, the difference computation unit 20 analyzes the list L stored in the main memory 103 to detect MoveNode as shown in
In an InsertNode analysis shown in
If a RemoveNode (n) is in the list L, it makes up a MoveNode (n) in combination with the InsertNode (n). Therefore, a MoveNode (n) is added to the list L (step 303), and the InsertNode (n) and the RemoveNode (n) are deleted from the list L (step 304). In this manner, the difference computation unit 20 processes all InsertNode in the list L.
In a RemoveNode analysis shown in
If an InsertNode (n) is in the list L, it makes up a MoveNode (n) in combination with the RemoveNode (n). Therefore, a MoveNode (n) is added to the list L (step 403), and the RemoveNode (n) and the InsertNode (n) are deleted from the list L (step 404). In this manner, the difference computation unit 20 processes all RemoveNode in the list L.
In a Modify analysis shown in
If a Modify (ny→n1) is in the list L, then the difference computation unit 20 checks whether the content of the node nx is identical with the content of the node ny (that is, nx=ny) (step 503). If nx=ny, it can be interpreted to mean that the positions of the node n1 and nx (=ny) have been exchanged. Therefore, a Movenode (n1) and a Movenode (ny) are added to the list L (step 504), and the Modify (n1→nx) and the Modify (ny→n1) are deleted from the list L (step 513).
If nx≠ny, it can be interpreted to mean that the node n1 has been moved to the original position of the node ny in the tree T, the node ny has been removed, and another node nx has been newly inserted into the original position of the node n1 . Therefore, an InsertNode (nx), a RemoveNode (ny), and a Movenode (n1) are added to the list L (step 505), and further the InsertNode analysis and the RemoveNode analysis shown in
If a Modify (ny→n1) is not in the list L in step 502, then the difference computation unit 20 checks whether an operation InsertNode (n1) for the node n1 is in the list L (step 507 in
If an InsertNode (n1) is not in the list L in step 507, then the difference computation unit 20 checks whether an operation RemoveNode (nx) for the node n1 is in the list L (step 510 in
If a RemoveNode (nx) is not in the list L in step 510, it can be interpreted to mean that the content of the node n1 has been simply modified into nx, and therefore the processing simply terminates. In this manner, the difference computation unit 20 processes all Modify in the list L.
Thus, the differences between the trees T and T′ are computed. The obtained difference data is temporarily stored in memory means, such as the main memory 103, to be used by the XPath update unit 30. As realized in these three analysis, in this embodiment, all operations for the tree T to be transformed into the tree T′ that can be interpreted as node movements are detected as moving operations Move so that they can be used in the subsequent XPath update processing.
The XPath update unit 30 receives an input of the computation result of the differences between the trees T and T′ obtained by the difference computation unit 20 and an input of an XPath for the unmodified document P (referred to as XPath (P) hereafter). Based on these inputs, the XPath update unit 30 then generates and outputs an XPath for the modified document P′ (referred to as XPath (P′) hereafter).
Referring to
Now, the XPath update processing performed by the XPath update unit 30 will be described in detail below.
The XPath (P) is formed of layers of paths (LocationStep) Ls (i) (i=0, 1, 2, . . . , n). In the unmodified document P, each set of nodes to be addressed by the LocationStep Ls (i), which is a NodeSet S (i), is computed in processing performed by the XPath interpreter 31. Similarly, in the modified document P′, a NodeSet S (i)′ to be addressed by the LocationStep Ls (i) is computed.
On the other hand, the node correspondence table 32, which represents the node correspondences between the unmodified and modified documents P and P′, is generated from the unmodified document P, the modified document P′, and the differences D between the unmodified and modified documents P and P′. The generated node correspondence table 32 is stored in memory means, such as a register of the CPU 101 or the main memory 103, in the computer shown in
Based on the node correspondence table 32 and the NodeSet S (i) to be addressed by the LocationStep Ls (i) in the unmodified document P, a NodeSet (i)″ is obtained.
The difference between the NodeSet S (i)′ and the NodeSet S (i)″ is that the NodeSet S (i)′ is obtained simply by applying path patterns to the modified document P′, whereas the NodeSet S (i)″ is obtained by tracking modifications based on the difference information. It is noted that both the NodeSet S (i)′ and the NodeSet S (i)″ are sets of nodes in the modified document P′.
Next, the XPath generator 33 compares the NodeSet S (i)′ and the NodeSet S (i)″, and updates the LocationStep Ls (i) in the XPath (P). The details of the update will be described later. Repeating this process for i (i=0 to n) provides LocationStep Ls (j)′ (j=0, 1, 2, . . . , m). This LocationStep Ls (j)′ directly represents an updated XPath (P′).
Referring to
If the NodeSet S (i)″ is included in the NodeSet S (i)′, then the XPath generator 33 generates a LocationStep from the nodes addressed by the LocationStep Ls (j-1)′ to the nodes included in the NodeSet S (i)″ (step 1103, 1104).
In this manner, the LocationSteps corresponding to the modified document P′ are generated, and the XPath (P) is modified into the XPath (P)′.
Some types of XPath notation allow the LocationSteps generated in step 1104 to be integrated into a simple expression by generalizing them based on a predetermined generalization rule. If the LocationStep Ls (j)′ cannot be generated based on a given generalization rule, the LocationSteps generated in step 1104 may be directly output while processing for an error is performed, such as displaying an alarm window or a window prompting for correction.
The generation of the LocationSteps in step 1104 may be performed, for example, with a known strategy disclosed in the literature 1 below. The integration of the LocationSteps may be performed, for example, with a known strategy disclosed in the literature 2 below.
Literature 1: Nov. 8, 2001: A Visual Approach to Authoring XPath Expressions Accepted for Markup Languages: Theory and Practice, Vol. 3, No. 2. This is a paper originally published in the Proceedings Extreme Markup Languages 2001, pp. 1-15, Montreal, Canada (14-17, Aug. 2001). http://ares.trl.ibm.com/freedom/doc/extml2001/abe0114.html
Literature 2: Jul. 13, 2001: XSLT Stylesheet Generation by Example with WYSIWYG Editing Accepted for the presentation at International Symposium on Applications and the Internet (SAINT 2002) http://ares.trl.ibm.com/freedom/doc/saint2002/saint2002.html
Now, the method for updating an XPath will be described based on examples of the tree modification.
Here, suppose that an XPath (P) “/a/b” for the unmodified document P addresses the three child nodes b of the node a. The expression “/a/b” addresses all child nodes b of the node a. Where the unmodified document P has been modified into the modified document P′, the “/a/b” would, if used as it is, address the four child nodes b of the node a in the modified document P′. However, the node b that has been moved to be a child of the node also existed in the unmodified document P and was a node that was not addressed by the “/a/b”. Therefore, it should not be addressed by the “/a/b” in the modified document P′ as well.
In this embodiment, the XPath update unit 30 can refer to the node correspondence table 32 generated according to the differences D computed by the difference computation unit 20, and know that the three nodes b addressed by the XPath (P) in the unmodified document P correspond to the first to third nodes b from left among the four nodes b in the modified document P′, as shown in
Referring to
As described above, this embodiment enables detecting a difference between an unmodified version and a modified version of a modified structured document, and based on the difference, automatically updating a corresponding XPath. However, in practice, the XPath may not be updated exactly according to the intention of a developer of a system involving the structured document and the XPath. In addition, the developer may want to further modify the XPath after it is automatically updated. Therefore, this embodiment can also be implemented as an interactive XPath update tool.
When the certain structured document 1511 annotated under the control of the annotation server 1500 is modified, the annotation server 1500 causes the display unit of the console 1510 to display an unmodified version and a modified version of the structured document 1511, and the interaction window 1512. The annotation server 1500 then asks the annotation developer whether to update the XPath according to the modification made to the structured document 1511. If the annotation developer clicks on the button “Yes” on the interaction window 1512, the XPath is automatically updated by the functions corresponding to the document analysis unit 10, the difference computation unit 20, and the XPaths update unit 30 of the annotation server 1500. If the annotation developer clicks on the button “Delete”, the XPath is deleted and the annotation for the structured document 1511 is cleared. For an element (node) simply removed or modified in the structured document 1511, reference to its XPath becomes impossible. Here, a message may be output for notifying the annotation developer of the removal of the annotated element and asking the developer whether to add the annotation to another element.
Although the foregoing describes addressing elements in a structured document such as XML or HTML/XML document using XPath, this embodiment may also be applied to addressing elements in a structured document by any other means. Specifically, differences between an unmodified version and a modified version of a modified structured document may be computed by a function corresponding to the difference computation unit 20 described in this embodiment, and modifications may be made as suitable for means for addressing elements in the structured document (such as addressing information). Then, the details of element designations may be appropriately updated according to modifications made to the structured document.
Thus, as described above, the invention can keep a desired element properly addressed in a structured document in which particular elements are addressed, even if the structured document is modified. The invention can also provide means for automatically updating an XPath addressing a particular element in a structured document based on a modification made to the structured document if the structured document is modified.
Variations described for the present invention can be realized in any combination desirable for each particular application. Thus particular limitations, and/or embodiment enhancements described herein, which may have particular advantages to the particular application need not be used for all applications. Also, not all limitations need be implemented in methods, systems and/or apparatus including one or more concepts of the present invention.
The present invention can be realized in hardware, software, or a combination of hardware and software. A visualization tool according to the present invention can be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system—or other apparatus adapted for carrying out the methods and/or functions described herein—is suitable. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which—when loaded in a computer system—is able to carry out these methods.
Computer program means or computer program in the present context include any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after conversion to another language, code or notation, and/or reproduction in a different material form.
Thus, the invention includes an article of manufacture which comprises a computer usable medium having computer readable program code means embodied therein for causing a function described above. The computer readable program code means in the article of manufacture comprises computer readable program code means for causing a computer to effect the steps of a method of this invention. Similarly, the present invention may be implemented as a computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing a function described above. The computer readable program code means in the computer program product comprising computer readable program code means for causing a computer to effect one or more functions of this invention. Furthermore, the present invention may be implemented as a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for causing one or more functions of this invention.
It is noted that the foregoing has outlined some of the more pertinent objects and embodiments of the present invention. This invention may be used for many applications. Thus, although the description is made for particular arrangements and methods, the intent and concept of the invention is suitable and applicable to other arrangements and applications. It will be clear to those skilled in the art that modifications to the disclosed embodiments can be effected without departing from the spirit and scope of the invention. The described embodiments ought to be construed to be merely illustrative of some of the more prominent features and applications of the invention. Other beneficial results can be realized by applying the disclosed invention in a different manner or modifying the invention in ways known to those familiar with the art.
Number | Date | Country | Kind |
---|---|---|---|
2002-206202 | Jul 2002 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
6785673 | Fernandez et al. | Aug 2004 | B1 |
6848079 | Ito | Jan 2005 | B2 |
20020054090 | Silva et al. | May 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20040088652 A1 | May 2004 | US |