This application is related to the application entitled “Extensible Decimal Identification System for Ordered Nodes”, now U.S. Ser. No. 10/605,448, which is hereby incorporated by reference in its entirety, including any appendices and references thereto. This application is also related to the application entitled “Isolated Ordered Regions (IOR) Node Order”, now U.S. Ser. No. 10/604,450, which is hereby incorporated by reference in its entirety, including any appendices and references thereto.
Field of Invention
The present invention relates generally to the field of hierarchical node management. More specifically, the present invention is related to node storage, location, and grouping via physical and logical identifiers.
Discussion of Prior Art
A tree structure comprising nodes is a type of data structure in which each element is attached to one or more elements directly beneath it. The connections among elements in a tree structure are called branches. Trees are often called inverted trees because they are normally drawn with the root at the top. Inverted trees are the data structures used to represent hierarchical file structures. In this case, the leaves are files and the other elements above the leaves are directories.
Tree structures have been used in prior art data processing systems to organize data. But, a disadvantage with such prior art is that they fails to provide for a node identification system for ordered nodes wherein adding or deleting a child node (or a subtree of nodes) from a hierarchical structure of nodes still maintains the order and relationships between the parent, child, and sibling nodes. Node identification solutions based upon assigning preorder traversal and/or postorder traversal numbers can only provide an ordering solution as such solutions cannot be used to identify sibling relationships between nodes.
Yet another disadvantage of such prior art systems is that they fail to address storage architectures for hierarchical data providing the advantage of having physical links between the nodes (for fast traversal) and also providing the advantage of having logical links between nodes (for easier updating, versioning, and reorganization).
The following references provide a general teaching in ordering nodes, but they fail to provide for a solution that incorporates advantages of both the physical and logical storage architectures.
The patent application publication to Sutherland et al. (2002/0114341 A1) discloses a peer-to-peer storage system that includes a storage coordinator that centrally manages distributed storage resources in accordance with system policies. The storage resources are otherwise unused portions of storage media, e.g., hard disks, that are included in the devices such as personal computers, workstations, laptops, file servers, and so forth, that are connected to a corporate computer network. The devices are referred to collectively as “storage nodes.” The storage coordinator manages the distributed storage resources by assigning the nodes to various groups and allocating the storage resources on each of the nodes in a given group to maintaining dynamically replicated versions of the group files.
The patent application publication to Schnelle et al. (2003/0070144) discloses the conversion of an XML encoded dataset into a minimal set of SQL tables. In the disclosed method, a hierarchical structure in the XML encoded dataset is identified. A node element set for the XML encoded dataset is determined, wherein each node element in the node element set is a discrete level of the hierarchical structure of the dataset. One or more nodes of the XML encoded dataset are determined, each node being an instance of a node element. A unique node identifier is allocated to each node. Then, an SQL node table containing one or more records is generated, each record corresponding to a respective one of the allocated node identifiers. An SQL ancestry table is optionally generated to define the inter-relationships among nodes of the identified hierarchical structure of the XML encoded dataset.
The patent application publication to Moses (2003/0061216) discloses a system and method for controlling access to data within a hierarchically organized document, such as an XML document. Elements may have their access rights specified, for example as a variable in an XML tag. If not specified within an element of the document, access rights are inherited from its nearest ancestor. Specified access rights may refer to a collection of entitlement expressions, which describe with arbitrarily fine granularity which users and user types may access the data.
The non-patent literature to Ives et al., titled “An XML query engine for network-bound data,” generally discloses how XML documents are modeled using a tree structure.
The above-mentioned prior art fail to provide for a solution that incorporates advantages of both the physical and logical storage architectures. Whatever the precise merits, features, and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention.
The present invention provides for a dynamic storage architecture for hierarchical data, wherein such an architecture is used to store parsed documents (e.g., XML documents) and manage their traversals, prefetches, updates, and reorganization. Nodes of a document are assigned Node IDs based on the extensible decimal identification system for ordered nodes, wherein Node IDs encode parent child relationships and new Node IDs are assigned to new nodes without affecting existing Node IDs.
The architecture groups sets of nodes in Node ID order forming ranges of Node IDs. The ranges are written to blocks. The highest Node ID (or, alternatively, lowest Node ID or a computed Node ID key) in a range and the block's Block ID are entered into a Node ID Range Index. Physical links exist between nodes in the same range while logical links, in the form of index entries in the Node ID Range Index, exist between nodes in different ranges. This architecture allows for single to multiple ranges per block as well as single to multiple subtree roots per range.
Node traversals within ranges are accomplished via physical links, while node traversals between ranges are accomplished via index lookups based on Node IDs. Updates in the hierarchy only affect the ranges that contain the nodes being updated and no other ranges because of the logical links maintained between ranges. The Node ID Range Index enables inter-range node traversals as well as prefetching of blocks containing ranges based on Node ID order.
a-b collectively illustrate the insertion of an XML document as per the teaching of the present invention.
While this invention is illustrated and described in a preferred embodiment, the invention may be produced in many different configurations. There is depicted in the drawings, and will herein be described in detail, a preferred embodiment of the invention, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and the associated functional specifications for its construction and is not intended to limit the invention to the embodiment illustrated. Those skilled in the art will envision many other possible variations within the scope of the present invention.
The present invention provides a dynamic storage architecture for hierarchical data, wherein such an architecture can be used to store parsed documents, such as parsed XML documents. The disclosed storage architecture allows for prefetching relevant parts of a document based on the document order, and allows for easy updating and reorganization of the document. This storage architecture has the advantages of a physical architecture because physical links between nodes in the same block allow for fast traversals, and the storage architecture also has the advantages of a logical architecture because logical links between nodes in different blocks allow for easier updating, versioning and reorganization.
The architecture groups sets of nodes in Node ID order forming ranges of Node IDs. The ranges are written to blocks. The highest Node ID in a range and the block's Block ID are entered into a Node ID Range Index. It should be noted that although a specific example of entering the highest Node ID (in the Node ID range index) is provided, other examples such as entering a lowest Node ID or a computed Node ID key is within the scope of the present invention. For example, if a set of nodes had the following Node IDs: 1.2, 1.2.4.5, 1.2.4.5.5.5.5, then the highest key would be 1.2.4.5.5.5.5 and the lowest key would be 1.2. A computed key would be 1.2.x, where ‘x’ is higher than any number.
Physical links exist between nodes in the same range while logical links, in the form of index entries in the Node ID Range Index, exist between nodes in different ranges. This architecture allows for single to multiple ranges per block as well as single to multiple subtree roots per range.
In
Node traversals within ranges are accomplished via physical links, while node traversals between ranges are accomplished via index lookups based on Node IDs. For example in
The present invention's architecture as depicted in
With Node IDs assigned to every node in the hierarchy, the nodes can now be grouped in Node ID order, as shown in
In
It should be noted that all the nodes that have index entries pointing to them are marked as Indexed Nodes. This is so that if the Indexed Node is moved with a set of nodes that have higher Node IDs than the Indexed Node to another block, the index entry corresponding to the Indexed Node can be updated. For example in
Alternatively, if the Node ID Range Index is always consulted and the current Highest Node ID remembered during an update, then the nodes that are indexed need not be marked, as any node movement will only involve nodes in the range bounded by the current Highest Node ID. For example, the ‘M’ will never be moved with ‘A’, ‘B’, and ‘C’ because ‘C’ is the highest Node ID'ed node in the range being updated.
In
a-b collectively illustrate the insertion of an XML document.
As events for nodes ‘A’ to ‘J’ are received shown in
Now, the second set of nodes shown in
Additionally, the present invention provides for an article of manufacture comprising computer readable program code contained within implementing one or more modules to physically store and manage logical groups of XML nodes. Furthermore, the present invention includes a computer program code-based product, which is a storage medium having program code stored therein which can be used to instruct a computer to perform any of the methods associated with the present invention. The computer storage medium includes any of, but is not limited to, the following: CD-ROM, DVD, magnetic tape, optical disc, hard drive, floppy disk, ferroelectric memory, flash memory, ferromagnetic memory, optical storage, charge coupled devices, magnetic or optical cards, smart cards, EEPROM, EPROM, RAM, ROM, DRAM, SRAM, SDRAM, or any other appropriate static or dynamic memory or data storage devices.
Implemented in computer program code based products are software modules for: grouping sets of nodes in NodeID order to form one or more ranges of Node IDs, wherein each of the ranges are written to a block of memory; (b) maintaining one or more physical links between nodes in a similar range; (c) maintaining an index of entries in a Node ID Range Index, each entry corresponding to a highest node ID (or, alternatively, lowest Node ID or a computed Node ID key) in a range, said index entries defining one or more logical links between nodes in different ranges; and wherein traversals within ranges are accomplished via the physical links and traversals between ranges occur based on an index lookup of NodeIDs.
A system and method has been shown in the above embodiments for the effective implementation of a hierarchical storage architecture using node ID ranges. While various preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, it is intended to cover all modifications falling within the spirit and scope of the invention, as defined in the appended claims. For example, the present invention should not be limited by type of Node ID recorded in the Node ID Range Index (i.e., Highest Node ID, Lowest Node ID, Computed Node Key, etc.), software/program, or computing environment.
The above enhancements are implemented in various computing environments. All programming and data related thereto are stored in computer memory, static or dynamic, and may be retrieved by the user in any of: conventional computer storage, display (i.e., CRT) and/or hardcopy (i.e., printed) formats. The programming of the present invention may be implemented by one of skill in the art of database programming.
Number | Name | Date | Kind |
---|---|---|---|
5285528 | Hart | Feb 1994 | A |
5568638 | Hayashi et al. | Oct 1996 | A |
6175835 | Shadmon | Jan 2001 | B1 |
6804677 | Shadmon et al. | Oct 2004 | B2 |
6901403 | Bata et al. | May 2005 | B1 |
7383286 | Hamanaka et al. | Jun 2008 | B2 |
20020087596 | Lewontin | Jul 2002 | A1 |
20020114341 | Sutherland et al. | Aug 2002 | A1 |
20020120598 | Shadmon et al. | Aug 2002 | A1 |
20020138353 | Schreiber et al. | Sep 2002 | A1 |
20030061216 | Moses | Mar 2003 | A1 |
20030070144 | Schnelle et al. | Apr 2003 | A1 |
20030110150 | O'Neil et al. | Jun 2003 | A1 |
20030145041 | Dunham et al. | Jul 2003 | A1 |
20030204515 | Shadmon et al. | Oct 2003 | A1 |
20040122950 | Morgan et al. | Jun 2004 | A1 |
20040172387 | Dexter et al. | Sep 2004 | A1 |
20050018152 | Ting et al. | Jan 2005 | A1 |
20050033733 | Shadmon et al. | Feb 2005 | A1 |
20050055334 | Krishnamurthy | Mar 2005 | A1 |
20050102256 | Bordawekar et al. | May 2005 | A1 |
Number | Date | Country |
---|---|---|
1265130 | Dec 2002 | EP |
Entry |
---|
Quanzhong Li et al. “Indexing and Querying XML Data for Regular Path Expressions” Proceedings of the 27th VLDB Conference 2001 pp. 361-370. |
McHugh et al. “Indexing Semistructured Data” Stanford University pp. 1-21. |
Jagadish et al, “Timber: A native XMI database” The VLDB Journal 2002, pp. 274-291. |
Fieberg et al. “Natix: A Technological Overview” Web databases and Web services 2002, LNCS 2493, pp. 12-33, 2003. |
Fiebig et al. “Anatomy of a native XML base managment system” The VLDB Journal 2002 pp. 292-314. |
Kanne et al, “Efficient storage of XML data” Jun. 16, 1999. |
Zachary Ives et al., “An XML query engine for network-bound data,” The International Journal of Very Large Data Bases, V11, N4, Dec. 13, 2002, pp. 380-402. |
Yuanying Mo et al., “Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database,” The Third International Conference on Web Information Systems Engineering (WISE '00), Dec. 12-14, 2002, Singapore, 10pgs. |
Number | Date | Country | |
---|---|---|---|
20060004792 A1 | Jan 2006 | US |