Retrieving information from an XML database can be costly in terms of both space and time. This is partially due to the fact that the semi-structured nature of XML does not lend itself to easy indexing. Additionally, maintaining indexes in an XML document can be difficult and time consuming. Most current XML databases have dealt with this problem by restricting the scope of the indexes, allowing only single attributes or single elements within an index. Others do not index XML as XML, instead forcing an internal conversion to a relational storage system to deal with the issue of indexing.
In response to these and other problems, in one embodiment, a system is provided for providing compound indexing for documents comprising semi-structured hierarchical data. The system comprises a database for storing a document comprising hierarchical, semi-structured data; a database engine for performing operations on and in connection with data stored in the database; and an index definition document (“IDD”) for defining an index for the document; wherein the database engine applies the IDD to the document to generate a set of index keys for the document.
This disclosure relates generally to XML databases and, more specifically, to a system and method for providing simple and compound indexes for such databases. It is understood, however, that the following disclosure provides many different embodiments or examples. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
The system 10 further includes a database engine 16 for performing various operations on and in connection with data stored in the XML database 12, including the XML document 13. As will be described in greater detail hereinbelow, an XML index definition document (“XIDD”) 18 is provided by the application 14 to the database engine 16. The database engine 16 stores the XIDD 18 in a dictionary collection 20 of the database 12 and generates a set of index keys 22 by applying the XIDD to the XML document 13. The index keys 22 point back to the nodes in the XML document 13 from which they were generated.
In one embodiment, the XML database 12 is a model based native XML database, such as Novell Corporation's XFLAIM database, for example. It will be recognized that, although portions of the embodiments described herein may be described with reference to the XFLAIM database, such descriptions are for the purposes of example only and that the embodiments described herein may be advantageously implemented using other types of XML databases as well.
As described in the aforementioned related application, which has been incorporated by reference in its entirety, the database engine 16 creates an in-memory tree structure that correspond to the tree structure if the XIDD 18. This structure is used to populate the index keys 22 as XML documents are added, modified, or deleted in the database 12.
As previously noted, the most basic unit of information in the XML database 12 is a node, such as an ElementComponent (also referred to herein as an “element” or “element node”), or an AttributeComponent (also referred to herein as an “attribute” or an “attribute node”). Every node in the database 12 is uniquely addressable by a Nodeld. Within the XML document 13, one node can be placed subordinate to another node; the nodes are then said to have a “parent-child” relationship. A node may have at most one parent node. Nodes that have the same parent are referred to as “siblings”.
The combination of arbitrary nesting of ElementComponents, the nesting of AttributeComponents under the Element Components, and the arbitrary designation of which nodes are to be considered key components may be used to define an index that can have any number of factors or keys. A “simple index” is one in which a single key component is identified; a “compound index” is one in which more than one key components are identified.
Applying the index definition document 50 (
As noted above with reference to
As a practical matter, it will be recognized that it might have made more sense to have nested the HomePhone elements 74a and 74b under the HomeAddress elements 72a and 72b, respectively, in the document 70 (
While the preceding description shows and describes one or more embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present disclosure. For example, various steps of the described methods may be executed in a different order or executed sequentially, combined, further divided, replaced with alternate steps, or removed entirely. In addition, various functions illustrated in the methods or described elsewhere in the disclosure may be combined to provide additional and/or alternate functions. Therefore, the claims should be interpreted in a broad manner, consistent with the present disclosure.
This application is related to commonly-owned U.S. patent application Ser. No. ______ (Atty. Docket No. IDR-921/26530.113) entitled SYSTEM AND METHOD FOR EFFICIENT MAINTENANCE OF INDEXES FOR XML FILES, filed on even date herewith and hereby incorporated by reference in its entirety.