1. Field of the Invention
The present invention relates generally to the field of node identifiers for hierarchical structures. More specifically, the present invention is related to the computation of pseudo keys for node identifiers.
2. Discussion of Prior Art
The hierarchy of a structured document, such as an XML document, is often represented by nodes in a logical tree. Correspondingly, nodes stored in storage units referred to as blocks provide a physical representation of a structured document. Each node in a tree is assigned and identified by a unique node identifier (ID). Sets of nodes stored in blocks form node ID ranges. A node ID range indicates the location of logical nodes within physical blocks. While a node may be logically proximate or adjacent to another node in a tree, it is not necessarily stored in the same or even proximate physical block.
Index entries in a node ID range index describe the ranges of node IDs that exist for nodes in a given block. For each node ID range in a block, an index entry is created. An index entry contains a field for a high node ID as well as a field indicating the block containing the specified range. A high node ID indicates the highest node ID in a specified node ID range. While node traversals within node ID ranges are accomplished via physical links, node traversals across ranges are facilitated via node ID range index lookups using a destination node ID.
In storage architectures utilizing node ID ranges to describe their contents, node insertions and updates often require the splitting as well as the merging of pre-existing node ID ranges. Insertions to node hierarchy only affect node ID ranges in which nodes are to be inserted because logical links are maintained between ranges. However, in some embodiments, insertions and deletions of nodes in a tree hierarchy necessitate the splitting of node ID ranges. A split node ID range further necessitates an additional index entry into a node ID range index. Keys for these new entries are found by traversing the nodes of the original node range and applying rules when finding the keys for the new index entry.
Whatever the precise merits, features, and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention. Therefore, there is a need in the art to compute keys to define node ID ranges without necessitating the traversal of an original node ID range.
A system and method of the present invention provide for the determination of pseudo keys to facilitate the bounding of node ID ranges. A pseudo previous high key is computed by decrementing the last digit of the lowest node ID value in a split-formed node ID range by one and by appending ‘x’.‘x’, where ‘x’ represents an arbitrary value greater than any digit used in a node ID. Conversely, zero is used to represent an arbitrary value less than any digit used in a node ID. A pseudo previous high key is computed such that no previous siblings or descendants of previous sibling will have a node ID higher in value.
In a first embodiment, pseudo keys are computed for use in node ID ranges that have been split. The determination of a high node ID value for a split node ID range is facilitated by the use of pseudo keys. The need to search for a real previous high key is obviated by the computation of a pseudo previous high key. Additionally, the computation of a pseudo key lessens the logic necessary for node ID splits, and lessens the number of node ID index entries created during subsequent node insertions and deletions.
In a second embodiment, pseudo keys are used to define boundaries of a sub-tree. A sub-tree is bounded by the range determined by a pseudo previous high key for its root node and a pseudo sub-tree high key. A pseudo sub-tree high key is computed by appending ‘x’ to a sub-root node ID. A pseudo sub-tree high key is ordered higher than any node ID in a sub-tree having as root, a given node ID. That is, node IDs assigned to currently existing or newly inserted nodes in a sub-tree rooted at the specified node, including that of the specified node itself, are contained within a determined boundary. A pseudo sub-tree low key is computed by appending zero followed by one to a node ID. A pseudo sub-tree low key is ordered lower than any node ID in a sub-tree having as root, the specified node. A pseudo end of document key is given by the value of ‘x’, where ‘x’ again represents an arbitrary value greater than any digit used in a node ID. A pseudo end of document key is ordered higher than node IDs of other nodes in a structured document.
In a third embodiment, a plurality of dimensioned node IDs are formed by appending more than one ‘x’ to a node ID. Thus, the collation of persistent versioned nodes that order either higher than or lower than existing sibling nodes is allowed.
a illustrates a single node ID range in a logical XML tree, associated node ID range index entry, and corresponding physical block.
b illustrates a split node ID range, node ID range index entries containing real keys, and corresponding physical blocks.
c illustrates a second split in node ID range, node ID range index entries containing real keys, and corresponding physical blocks.
d illustrates a split node ID range, node ID range index entries containing pseudo keys, and corresponding physical blocks.
e illustrates a second split in node ID range, node ID range index entries containing pseudo keys, and corresponding physical blocks.
While this invention is illustrated and described in a preferred embodiment, the invention may be produced in many different configurations. There is depicted in the drawings, and will herein be described in detail, a preferred embodiment of the invention, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and the associated functional specifications for its construction and is not intended to limit the invention to the embodiment illustrated. Those skilled in the art will envision many other possible variations within the scope of the present invention.
Shown in
Node IDs are generated based steps discussed in patent application commonly assigned U.S. Ser. No. 10/605,448 referenced in the background section. In accordance with one embodiment of this method, nodes inserted between siblings have more digits than previous or next siblings. A node X 138 inserted between node C 104 and node M 106 has a node ID value of 1.1.1.x.1, where ‘x’ represents an arbitrary value greater than any digit used in a node ID. Node X 138 having a node ID value of 1.1.1.x.1 ensures that descendants of node C 104 are not greater in value than nor ordered ahead of inserted node X 138. This is because descendants of node C 104 have node IDs generated such that their last digit does not reach the value of ‘x’.
Nodes are stored in blocks based on a method as described in co-pending application “Hierarchical Storage Architectures for Node ID Ranges”. Sets of nodes stored in blocks form node ID Ranges. Shown in
Shown in
Node traversals within node ID ranges are accomplished via physical links while node traversals across ranges are accomplished via node ID range index 300 lookups based on a current node ID. For example, in order to traverse node B 202 to node C 204, a node ID range index 300 lookup using the node ID value 1.1.1 of destination node C 204 is performed. A lookup operation using node C 204 results in the use of node ID range index entry 304 having as the value of its high node ID 312, 1.1.1.3.2.2. Insertions to node hierarchy only affect ranges in which nodes are to be inserted because logical links are maintained between ranges. In some embodiments, insertions and deletions of nodes in a tree hierarchy necessitate the splitting of node ID ranges. A split node ID range further necessitates an additional node ID range index entry into node ID range index 300. High node ID values for new node ID range index entries are obtained by traversing nodes of an original node range and subsequently applying rules to traversed nodes IDs. For a detailed discussion of these rules, please refer to co-pending application, “Hierarchical Storage Architectures for Node ID Ranges”.
The determination of a high node ID value for a node ID range is facilitated by the use of pseudo keys. Rather than simply selecting as a high node ID the highest node ID value in a node ID range, a pseudo key is computed. The computation of a pseudo key lessens the logic necessary for node ID splits, and lessens the number of node ID index entries created during subsequent insertions and deletions. In a first embodiment, pseudo keys are computed for use in node ID ranges that have been split.
a illustrates an initial stage of storage in which nodes are stored in a single block 438. The highest node ID in a block is known as a range high key. If a node is to be inserted into block 438, it is first determined if node ID assigned to a node to be inserted is less than a range high key for block 438. In
In
In
In
In
In another embodiment, pseudo keys are used to define boundaries of a sub-tree. For example, a sub-tree having as root node H 414 as shown in
In yet another embodiment, a plurality of dimensioned pseudo keys are formed by appending more than one ‘x’ to existing node ID before appending a known digit. A known digit for a pseudo sub-tree low key is one. For example, a pseudo key for node ID 4.5 is also computed as 4.4.x.x.1, 4.4.x.x.x.1, and so on. Thus, a provision is made for the collation of persistent versioned nodes that order either higher than or lower than existing sibling nodes.
Additionally, the present invention provides for an article of manufacture comprising computer readable program code contained within implementing one or more modules to compute pseudo keys for existing node IDs, create index entries for computed pseudo keys, and insert index entries for computed pseudo keys. Furthermore, the present invention includes a computer program code-based product, which is a storage medium having program code stored therein which can be used to instruct a computer to perform any of the methods associated with the present invention. The computer storage medium includes any of, but is not limited to, the following: CD-ROM, DVD, magnetic tape, optical disc, hard drive, floppy disk, ferroelectric memory, flash memory, ferromagnetic memory, optical storage, charge coupled devices, magnetic or optical cards, smart cards, EEPROM, EPROM, RAM, ROM, DRAM, SRAM, SDRAM, or any other appropriate static or dynamic memory or data storage devices.
Implemented in computer program code based products are software modules for: (a) computing a pseudo key or pseudo keys for an existing node ID; (b) creating a node ID range index record; and (c) inserting into a node ID range index said created index entry.
A system and method has been shown in the above embodiments for the effective implementation of psuedo keys in node ID range based storage architecture. While various preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, it is intended to cover all modifications falling within the spirit and scope of the invention, as defined in the appended claims. For example, the present invention should not be limited by software/program or specific computing hardware.
The above enhancements are implemented in various computing environments. For example, the present invention may be implemented on a conventional IBM PC or equivalent. All programming and data related thereto are stored in computer memory, static or dynamic, and may be retrieved by the user in any of: conventional computer storage and/or display (i.e., CRT) formats. The programming of the present invention may be implemented by one of skill in the art object-oriented programming.
This application is related to the application entitled “Extensible Decimal Identification System for Ordered Nodes”, now U.S. Ser. No. 10/605,448, and co-pending application entitled, “Hierarchical Storage Architectures using Node ID Ranges” both of which are hereby incorporated by reference in their entirety, including any appendices and references thereto.