Optimizing correlated XML extracts

Information

  • Patent Grant
  • 8073841
  • Patent Number
    8,073,841
  • Date Filed
    Friday, October 7, 2005
    19 years ago
  • Date Issued
    Tuesday, December 6, 2011
    13 years ago
Abstract
Queries that request fields that are contained in the same XML fragments are written to execute them more efficiently.
Description
RELATED APPLICATIONS

This application is related to U.S. patent application Ser. No. 10/428,878, entitled Techniques For Rewriting XML Queries Directed To Relational Database Constructs, filed by Anand Manikutty, et al. on May 1, 2003, referred to hereafter as the “Rewrite Application”, the contents of which are incorporated herein by reference as if originally set forth herein.


This application is related to U.S. application Ser. No. 10/884,311, entitled Index For Accessing XML Data, hereafter referred to as the “XML Index Application”, filed on July 2, 2004 by Sivasankaran Chandrasekar, the contents of which are herein incorporated by reference in their entirety for all purposes.


FIELD OF THE INVENTION

The present invention relates to executing queries that request XML data, and in particular, rewriting such queries to optimize their computation and/or execution.


BACKGROUND

The Extensible Markup Language (XML) is the standard for data and documents that is finding wide acceptance in the computer industry. XML describes and provides structure to a body of data, such as a file or data packet, referred to herein as a XML entity. The XML standard provides for tags that delimit sections of a XML entity referred to as XML elements. Each XML element may contain one or more name-value pairs referred to as attributes. The following XML Segment A is provided to illustrate XML.












SEGMENT A

















<book>My book



  <publication publisher=”Doubleday”



      date=”January”></publication>



  <Author>Mark Berry</Author>



  <Author>Jane Murray</Author>



</book>










XML elements are delimited by a start tag and a corresponding end tag. For example, segment A contains the start tag <Author> and the end tag </Author> to delimit an element. The data between the elements is referred to as the element's content. In the case of this element, the content of the element is the text data Mark Berry.


Element content may contain various other types of data, which include attributes and other elements. The book element is an example of an element that contains one or more elements. Specifically, book contains two elements: publication and author. An element that is contained by another element is referred to as a descendant of that element. Thus, elements publication and author are descendants of element book. An element's attributes are also referred to as being contained by the element.


By defining an element that contains attributes and descendant elements, the XML entity defines a hierarchical tree relationship between the element, its descendant elements, and its attribute. A root node and a set of elements that descend from the root node are referred to herein as a XML document.


XML Data Models


An important standard for XML is the XQuery 1.0 and XPath 2.0 Data Model. (see W3C Working Draft, 29 Oct. 2004), which is incorporated herein by reference and referred to hereinafter as the XQuery Data Model.


One aspect of XQuery Data Model is that XML data is represented by a hierarchy of nodes that reflects the hierarchical nature of the XML data. A hierarchy of nodes is composed of nodes at multiple levels. The nodes at each level are each linked to one or more nodes at a different level. Each node at a level below the top level is a child node of one or more of the parent nodes at the level above. Nodes at the same level are sibling nodes. In a tree hierarchy or node tree, each child node has only one parent node, but a parent node may have multiple child nodes. In a tree hierarchy, a node that has no parent node linked to it is the root node, and a node that has no child nodes linked to it is a leaf node. A tree hierarchy has a single root node.


In a node tree that represents a XML document, a node can correspond to an element, the child nodes of the node correspond to an attribute or another element contained in the element. The node may be associated with a name and value. For example, for a node tree representing the element book, the name of the node associated with element book is book, and the value is ‘My book’. For a node representing the attribute publisher, the name of the node is publisher and the value of the node is ‘Doubleday’.


For convenience of expression, elements and other parts of a XML document are referred to as nodes within a tree of nodes that represents the document. Thus, referring to ‘My book’ as the value of the node with name book is just a convenient way of expressing the value of the element associated with node book is My book.


A XML fragment is a portion of a XML document. A XML fragment can be a subtree within a XML document. A XML fragment may also be an attribute, an element, or a XML sequence (“sequence”) of elements that descend from parent node but does not include the parent node.


Finally, the term XML value is used herein to refer to any value stored or represented by a XML document or parts thereof. A XML value may be a scalar value, such as the string value of an element and the numeric value of an attribute-value pair; a XML value may be a XML fragment, or a XML document. The term XML value refers to any value represented by the XQuery Data Model.


XML Storage and Query Mechanisms


Various types of storage mechanisms are used to store a XML document. One type of storage mechanism stores a XML document as a text file in a file system.


Another type of storage mechanism uses object-relational database systems that have been enhanced to store and query XML values. In an embodiment, a XML document is stored in a row of a table and nodes of the XML document are stored in separate columns in the row. An entire XML document may also be stored in a lob (large object). A XML document may also be stored as a hierarchy of objects in an object-relational database; each object is an instance of an object class and stores one or more elements of a XML document. The object class defines, for example, the structure corresponding to an element, and includes references or pointers to objects representing the immediate descendants of the element. Tables and/or objects of a database system that hold XML values are referred to herein as base tables or objects.


It is important for object-relational database systems that store XML values to be able to execute queries using XML query languages, such as XQuery/XPath. XML Query Language (“XQuery”) and XML Path Language (“XPath”) are important standards for a query language, which can be used in conjunction with SQL to express a large variety of useful queries. XPath is described in XML Path Language (XPath), version 1.0 (W3C Recommendation 16 Nov. 1999), which is incorporated herein by reference. XPath 2.0 and XQuery 1.0 are described in XQuery 1.0 and XPath 2.0 Full-Text. (W3C Working Draft 9 Jul. 2004), which is incorporated herein by reference.


Queries that request XML values often request multiple XML values from the same XML document or fragment. However, the computation of such a query is performed in a way such that, for the particular row that holds the requested XML document or fragment, the row is accessed multiple times, once for each of the multiple XML values of the XML document. The following query QB is provided as an illustration.

















select



  extract (po, ‘/PurchaseOrder/Pono’),



  extract (po, ‘/PurchaseOrder/BillingAddress’),....



  from po_table;










Each row in base table po_table holds a XML document. For each such document, the row that contains the XML document is accessed for each of the Extract function invocations in query QB.


Based on the foregoing, there is a need for a more efficient approach for computing queries that request XML values from a XML document.


The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in the BACKGROUND section.





BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:



FIG. 1 is a diagram of a query that requests multiple fields contained in the same XML fragment according to an embodiment of the present invention.



FIG. 2 is a flow chart of a procedure for rewriting a query according to an embodiment of the present invention.



FIG. 3 is a diagram of a query produced by the procedure depicted in FIG. 2 according to an embodiment of the present invention.



FIG. 4 is a block diagram of computer system used to implement an embodiment of the present invention.





DETAILED DESCRIPTION

In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.


Described herein are techniques that can optimize and reduce database accesses incurred to compute queries that request fields that are correlated. A field is an element (including a complex element), element attribute, object, object attribute, or column whose values are requested by a database query. In an embodiment, the fields requested are referenced or declared in the select-list of a database query that conforms to a standard of SQL, such as SQL/XML as defined (INCITS/ISO/IEC 9075-14:2003, which is incorporated herein by reference). However, an embodiment of the present invention is not limited to a particular standard of SQL, including a non-proprietary standard or a proprietary standard, examples of latter being those supported by database server products of Oracle corporation.


A database query may request a field by declaring that the field's values are part of the results of the query; for example, a database query may simply reference a column's name in the select-list. A database query may request a field by specifying that the field is an input to a function whose output is declared to part of the results requested by the database query. A database query may request an element as a field by requesting the element via an SQL/XML operator; such as the extract function.


Correlated fields are fields whose values are contained in the same XML fragment, that is, their values can be retrieved or derived from the same source XML fragment. Such a XML fragment is referred to herein as a “common source fragment” with respect to the correlated fields. For a given query, groups of correlated fields are determined, and the query is rewritten to generate and access a new and different data source for the correlated fields, referred to herein as a correlated field source. The correlated field source can be computed and/or accessed with less database accesses.



FIG. 1 depicts a query QI, which is provided to illustrate correlated fields and techniques for rewriting queries described later herein. Query QI requests fields from table auction_t. For purposes of illustration, each row of auction_t contains a XML document.


Query QI requests two fields: elements education and business, which correspond, respectively, to select-list expressions extractvalue (value (t) ‘/person/profile/education’) and extractvalue(value(t), ‘/person/profile/business’). These fields descend from and are thus contained in the same element /site/people/person/profile. The two fields are correlated fields because, for a given XML document stored in auction_t, the values for the two fields can be retrieved from a common source fragment.


To reduce database accesses, a query is rewritten to generate a correlated field source that contains all field values of an XML fragment, or at least more fields values than just those of the correlated fields. For a given a set of correlated fields, a common source fragment may contain more fields than are in the set, in fact, many more than are in the set. It is possible that more computer resources may be expanded to generate the correlated field source than are saved by a reduction in database accesses. To determine whether to rewrite a query to use a correlated field source, “rewrite criterion” is used. A rewrite criterion is a criterion that is used to determine whether to rewrite a query. A rewrite criterion may, for example, indicate when the cost expended to compute a correlated field source that contains the fields of a common source fragment exceed those of incurring a database access for each of the correlated fields. Rewrite criteria can be based on heuristics. In general, if the number of correlated fields in a database query constitutes a sufficiently large threshold portion of the total number fields in the corresponding common source fragment, then the savings realized by rewriting are greater and rewrites is merited.


For example, assume for a database query that M is the number of correlated fields for a common source fragment. For the common source fragment in query QI, M is 2.


Further assume that N is the total number of fields in a common source fragment that corresponds to the correlated fields. If the XML documents stored auction_t include four elements (without any attributes) in element ‘/site/people/person’, then N is four.


If a correlated field source is generated and used to retrieve the correlated fields, then one database row is accessed to generate the correlated field source for each common source fragment, assuming that the common source fragment or its document are completely stored in the row. Without rewriting to use the correlated field source, a database access can be accessed M times. In the case of query QI, a row in auction_t that stores a common source fragment is accessed twice.


The ratio of M/N reflects what portion of a common source fragment the correlated fields constitute. The M/N ratio is the current example is 0.5. If however, descendant elements /site/people/person/profile’ included attributes such that N is equal to 20, then the M/N ratio is lower at 0.1.


In general, the higher the M/N ratio, the greater savings potential of a correlated field source rewrite. In an embodiment of the present invention, a ratio of M/N that is 0.5 or greater may satisfy rewrite criteria. A ratio as low as 0.1 may not.


Illustrative Embodiment

According to an embodiment of the present invention, database queries are received by a database system and rewritten to generate a correlated field source cause and retrieve correlated field values from the correlated field source. A database system typically comprises one or more clients that are communicatively coupled to a server that is connected to a shared database. “Server” may refer collectively to a cluster of server instances and machines on which the instances execute. Generally, a server is a combination of integrated software components and an allocation of computational resources, such as memory, a node, and processes on the node for executing the integrated software components on a processor, where the combination of the software and computational resources are dedicated to providing a particular type of function on behalf of clients of the server. Among other functions of database management, a database server governs and facilitates access to a particular database, processing requests by clients to access the database.


A database server retrieves and manipulates data in response to receiving a database statement. Typically, the database statement conforms to a database language, such as SQL. A database statement can specify a query operation, a data manipulation operation, or a combination thereof. For some, only database statements that specify a query operation are referred to as a database query. As the term is used herein, a database query is not limited to database statements that specify a particular type of operation. Database queries include those that specify data manipulation operations.


A database comprises data and metadata that is stored on a persistent memory mechanism, such as a set of hard disks. Such data and metadata may be stored in a database logically, for example, according to relational and/or object-relational database constructs.


When performing a database access to access a data item from a persistent memory mechanism, the data item is read into a buffering system in the shared volatile and/or virtual memory of a database system. Subsequent accesses to the data may be made by accessing the data in the buffering system with or without having to first read the data from the persistent storage mechanism. Thus, a database access, as the term is used herein, may or may not entail an access to a persistent storage access mechanism and but may only entail access to buffering system. It should be noted that even without having to read from a persistent memory mechanism to perform a database access, access to a buffering system itself can entail significant overhead in the form of, for example, contention processing to manage access to data in the buffering system.


When the database system stores a XML document in a database, it generates statistics about the XML document and XML fragments contained therein. Such statistics may include, for example, information that indicates the number fields in a particular set of XML fragments of a set of XML documents, or fan-out, the number of descendants of various elements or number siblings within levels of the XML documents.


When a database system receives a query, a query engine parses the database query and determines how it can be optimized and executed. Optimization includes rewriting a database query to generate a rewritten query that is equivalent but can be executed more efficiently. Queries are equivalent if their computation yields equivalent results. The following procedure is followed by a database server to determine whether and how to rewrite a database query to produce a rewritten query that materializes a source XML document and retrieves correlated field values there from.


Procedure for Query Rewrite



FIG. 2 shows a procedure for rewriting a query. The procedure is illustrated using rewritten query QI and QI′, shown in FIG. 3.


At step 205, XML fragments that are a common source fragment for correlated fields are identified. In query QI, the element ‘/person/profile/’ is identified as a common source fragment for the pair of elements ‘education’ and ‘business’.


At step 210, fields are grouped as correlated fields based on common source fragments. At step 215, fields that have not been correlated to a common source fragment (“uncorrelated field”) are grouped into another group. In the current example, elements ‘education’ and ‘business’ share a common source and are therefore grouped together as correlated fields.


For each group of correlated fields, at step 220 it is determined whether rewrite criteria are satisfied. If the rewrite criteria are satisfied, then steps 225 and 230 are performed for that group. For purposes of illustration, the group of correlated fields consisting of elements ‘education’ and ‘business’ satisfy rewrite criteria.


At step 225, query rewrite operations are performed to provide for the group a correlated field source that can require less database accesses to compute. At step 230, query rewrite operations are performed so that the correlated field values are retrieved from the correlated field source.


In the current example, query QI is rewritten to query QI′ as shown in FIG. 3. The from-list of query QI has been rewritten to specify a correlated field source in the form of the result set of subquery QI′sub. The result set includes name-value pairs, i.e. an elements name and value in each row of the result set. In the result set, columns elemname and elemvalue correspond to the element name and element value of the name and value of the name-value pairs, respectively.


The select-list of query of QI′ has also been modified to access these columns and retrieve the correlated fields from the result set. The result set is stored as interim data during the computation of QI′sub, typically in volatile memory, and can be accessed without entailing a database access.


The function sys_xmlnode_getokey ( ) of the group by clause returns an “order key”, which is a value representing the hierarchical position of a node in an XML document. For example, if a node is the 3rd child of the 5th child of the root, its order key is 1.5.3. Further details about order keys may be found in XML Index Application.


For a common source fragment having values in the result set, all field values of common source fragment are stored as name-value pairs in the result set, including fields not requested by the select-list of query QI. Even though all these fields are generated and stored, the field values were computed with less database accesses per common source fragment.


According to an embodiment, the new data source for a database query may be computed more efficiently when common source fragments accessed are indexed by a XML index. One such index is the XML index described in XML Index Application, which indexes nodes in each XML document of a XML collection.


After executing the procedure depicted in FIG. 2, the database query may be further rewritten to further optimize its execution. Such rewriting can include rewriting operations described in the Rewrite Application. During such rewrites, XQuery/XPath queries received by a database server that specify XQuery/XPath operations are dynamically rewritten into object-relational queries that directly reference and access the underlying base tables that store XML documents. Specific techniques for implementing the rewrite approach are described in the Rewrite application.


Hardware Overview



FIG. 4 is a block diagram that illustrates a computer system 400 upon which an embodiment of the invention may be implemented. Computer system 400 includes a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information. Computer system 400 also includes a main memory 406, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404. Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404. Computer system 400 further includes a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk or optical disk, is provided and coupled to bus 402 for storing information and instructions.


Computer system 400 may be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 414, including alphanumeric and other keys, is coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is cursor control 416, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.


The invention is related to the use of computer system 400 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in main memory 406. Such instructions may be read into main memory 406 from another machine-readable medium, such as storage device 410. Execution of the sequences of instructions contained in main memory 406 causes processor 404 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.


The term “machine-readable medium” as used herein refers to any medium that participates in providing data that causes a machine to operation in a specific fashion. In an embodiment implemented using computer system 400, various machine-readable media are involved, for example, in providing instructions to processor 404 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 410. Volatile media includes dynamic memory, such as main memory 406. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 402. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. All such media must be tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into a machine.


Common forms of machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.


Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 404 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 400 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 402. Bus 402 carries the data to main memory 406, from which processor 404 retrieves and executes the instructions. The instructions received by main memory 406 may optionally be stored on storage device 410 either before or after execution by processor 404.


Computer system 400 also includes a communication interface 418 coupled to bus 402. Communication interface 418 provides a two-way data communication coupling to a network link 420 that is connected to a local network 422. For example, communication interface 418 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 418 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 418 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.


Network link 420 typically provides data communication through one or more networks to other data devices. For example, network link 420 may provide a connection through local network 422 to a host computer 424 or to data equipment operated by an Internet Service Provider (ISP) 426. ISP 426 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 428. Local network 422 and Internet 428 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 420 and through communication interface 418, which carry the digital data to and from computer system 400, are exemplary forms of carrier waves transporting the information.


Computer system 400 can send messages and receive data, including program code, through the network(s), network link 420 and communication interface 418. In the Internet example, a server 430 might transmit a requested code for an application program through Internet 428, ISP 426, local network 422 and communication interface 418.


The received code may be executed by processor 404 as it is received, and/or stored in storage device 410, or other non-volatile storage for later execution. In this manner, computer system 400 may obtain application code in the form of a carrier wave.


In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims
  • 1. A method comprising: evaluating a first database query that requests a plurality of requested fields from one or more XML documents stored in a database;wherein said query requests a respective value for each requested field of the plurality of requested fields and includes an XPATH expression that identifies said each requested field;wherein the evaluating includes: identifying the plurality of requested fields as correlated fields of a common XML fragment, the respective value of each requested field of said plurality of requested fields being contained within said common XML fragment; anddetermining whether one or more rewrite criteria are satisfied based on the correlated fields; andif said one or more rewrite criteria are satisfied, then rewriting said first database query to generate a rewritten query that includes a new data source that contains field values from said common XML fragment, wherein the field values from said common XML fragment include the respective value of said each requested field and at least one field value not requested by said database query;wherein the method is performed by one or more computing devices.
  • 2. The method of claim 1, wherein the new data source includes name-value pairs that represent said certain values.
  • 3. The method of claim 1, wherein the first database query conforms to a standard of SQL.
  • 4. The method of claim 1, wherein the first database query includes a select-list that references said correlated fields but not said at least one field value.
  • 5. The method of claim 1, wherein: the first database query requests another field as part of the results to compute for the first database query; andwherein the rewritten query specifies a second data source, different than said new data source, for said another field.
  • 6. The method of claim 1, wherein: said database includes an index that indexes nodes of a collection of XML documents; andwherein execution of the rewritten query uses said index to access said XML documents.
  • 7. The method of claim 1, wherein rewriting said first database query includes rewriting said first database query to access base tables in the database that store XML documents.
  • 8. The method of claim 1, wherein determining whether one or more rewrite criteria is satisfied is based on statistics generated about XML documents stored in the database.
  • 9. The method of claim 1, wherein determining whether one or more rewrite criteria is satisfied is based on the number of correlated fields and a number of fields contained in said XML fragment.
  • 10. A non-transitory machine-readable storage medium storing one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the following steps: evaluating a first database query that requests a plurality of requested fields from one or more XML documents stored in a database;wherein said query requests a respective value for each requested field of the plurality of requested fields and includes an XPATH expression that identifies said each requested field;wherein the evaluating includes: identifying the plurality of requested fields as correlated fields of a common XML fragment, the respective value of each requested field of said plurality of requested fields being contained within said common XML fragment; anddetermining whether one or more rewrite criteria are satisfied based on the correlated fields; andif said one or more rewrite criteria are satisfied, then rewriting said first database query to generate a rewritten query that includes a new data source that contains field values from said common XML fragment, wherein the field values from said common XML fragment include the respective value of said each requested field and at least one field value not requested by said database query.
  • 11. The machine-readable storage medium of claim 10, wherein the new data source includes name-value pairs that represent said certain values.
  • 12. The machine-readable storage medium of claim 10, wherein the first database query conforms to a standard of SQL.
  • 13. The machine-readable storage medium of claim 10, wherein the first database query includes a select-list that references said correlated fields but not said at least one field value.
  • 14. The machine-readable storage medium of claim 10, wherein: the first database query requests another field as part of the results to compute for the first database query; andwherein the rewritten query specifies a second data source, different than said new data source, for said another field.
  • 15. The machine-readable storage medium of claim 10, wherein: said database includes an index that indexes nodes of a collection of XML documents; andwherein execution of the rewritten query uses said index to access said XML documents.
  • 16. The machine-readable storage medium of claim 10, wherein rewriting said first database query includes rewriting said first database query to access base tables in the database that store XML documents.
  • 17. The machine-readable storage medium of claim 10, wherein determining whether one or more rewrite criteria is satisfied is based on statistics generated about XML documents stored in the database.
  • 18. The machine-readable storage medium of claim 10, wherein determining whether one or more rewrite criteria is satisfied is based on the number of correlated fields and a number of fields contained in said XML fragment.
US Referenced Citations (330)
Number Name Date Kind
4993025 Vesel et al. Feb 1991 A
5202982 Gramlich et al. Apr 1993 A
5210686 Jernigan May 1993 A
5226137 Bolan et al. Jul 1993 A
5247658 Barrett et al. Sep 1993 A
5257366 Adair et al. Oct 1993 A
5295256 Bapat Mar 1994 A
5295261 Simonetti Mar 1994 A
5307490 Davidson et al. Apr 1994 A
5313629 Abraham et al. May 1994 A
5327556 Mohan et al. Jul 1994 A
5369763 Biles Nov 1994 A
5388257 Bauer Feb 1995 A
5404513 Powers et al. Apr 1995 A
5410691 Taylor Apr 1995 A
5454101 Mackay et al. Sep 1995 A
5463772 Thompson et al. Oct 1995 A
5467471 Bader Nov 1995 A
5499371 Henninger et al. Mar 1996 A
5504892 Atsatt et al. Apr 1996 A
5506991 Curry et al. Apr 1996 A
5524240 Barbara et al. Jun 1996 A
5530849 Hanushevsky et al. Jun 1996 A
5544360 Lewak et al. Aug 1996 A
5546571 Shan et al. Aug 1996 A
5561763 Eto et al. Oct 1996 A
5566331 Irwin, Jr. et al. Oct 1996 A
5568640 Nishiyama et al. Oct 1996 A
5574915 Lemon et al. Nov 1996 A
5625815 Maier et al. Apr 1997 A
5630125 Zellweger May 1997 A
5643633 Telford et al. Jul 1997 A
5680614 Bakuya et al. Oct 1997 A
5682524 Freund et al. Oct 1997 A
5684990 Boothby Nov 1997 A
5689706 Rao et al. Nov 1997 A
5701467 Freeston Dec 1997 A
5724577 Exley et al. Mar 1998 A
5734887 Kingberg et al. Mar 1998 A
5737736 Chang Apr 1998 A
5758153 Atsatt et al. May 1998 A
5802518 Karaev et al. Sep 1998 A
5819275 Badger et al. Oct 1998 A
5822511 Kashyap et al. Oct 1998 A
5832526 Schuyler Nov 1998 A
5838965 Kavanagh et al. Nov 1998 A
5842212 Balluvio et al. Nov 1998 A
5848246 Gish Dec 1998 A
5870590 Kita et al. Feb 1999 A
5878415 Olds Mar 1999 A
5878434 Draper et al. Mar 1999 A
5892535 Allen et al. Apr 1999 A
5897632 Dar et al. Apr 1999 A
5905990 Inglett May 1999 A
5915253 Christiansen Jun 1999 A
5917492 Bereiter et al. Jun 1999 A
5918225 White et al. Jun 1999 A
5921582 Gusack Jul 1999 A
5937406 Balabine et al. Aug 1999 A
5940591 Boyle et al. Aug 1999 A
5960194 Choy et al. Sep 1999 A
5964407 Sandkleiva Oct 1999 A
5974407 Sacks Oct 1999 A
5978791 Farber et al. Nov 1999 A
5983215 Ross et al. Nov 1999 A
5987506 Carter et al. Nov 1999 A
5991771 Falls et al. Nov 1999 A
5999936 Pattison et al. Dec 1999 A
5999941 Andersen Dec 1999 A
6003040 Mital et al. Dec 1999 A
6012067 Sarkar Jan 2000 A
6018747 Burns et al. Jan 2000 A
6023706 Schmuck et al. Feb 2000 A
6023765 Kuhn Feb 2000 A
6029160 Cabrera et al. Feb 2000 A
6029166 Mutalik et al. Feb 2000 A
6029175 Chow et al. Feb 2000 A
6038563 Bapat et al. Mar 2000 A
6052122 Sutcliffe et al. Apr 2000 A
6055544 DeRose et al. Apr 2000 A
6061684 Glasser et al. May 2000 A
6088694 Burns et al. Jul 2000 A
6092086 Martin et al. Jul 2000 A
6101500 Lau Aug 2000 A
6111578 Tesler Aug 2000 A
6112209 Gusack Aug 2000 A
6115741 Domenikos et al. Sep 2000 A
6119118 Kain, III et al. Sep 2000 A
6128610 Srinivasan et al. Oct 2000 A
6141655 Johnson et al. Oct 2000 A
6154741 Feldman Nov 2000 A
6182121 Wlaschin Jan 2001 B1
6185574 Howard et al. Feb 2001 B1
6189012 Mital et al. Feb 2001 B1
6192273 Igel et al. Feb 2001 B1
6192373 Haegele Feb 2001 B1
6199195 Goodwin et al. Mar 2001 B1
6208993 Shadmone Mar 2001 B1
6212512 Barney et al. Apr 2001 B1
6212557 Oran Apr 2001 B1
6230310 Arrouye et al. May 2001 B1
6233729 Campara et al. May 2001 B1
6236988 Aldred May 2001 B1
6240407 Chang et al. May 2001 B1
6247024 Kincaid Jun 2001 B1
6263332 Nasr et al. Jul 2001 B1
6263345 Farrar et al. Jul 2001 B1
6269380 Terry et al. Jul 2001 B1
6269431 Dunham Jul 2001 B1
6279006 Shigemi et al. Aug 2001 B1
6279007 Uppala Aug 2001 B1
6285997 Carey et al. Sep 2001 B1
6298349 Toyoshima et al. Oct 2001 B1
6301605 Napolitano et al. Oct 2001 B1
6321219 Gainer et al. Nov 2001 B1
6330573 Salisbury et al. Dec 2001 B1
6339382 Arbinger et al. Jan 2002 B1
6341289 Burroughs et al. Jan 2002 B1
6343287 Kumar et al. Jan 2002 B1
6349295 Tedesco et al. Feb 2002 B1
6356920 Vandersluis Mar 2002 B1
6363371 Chaudhuri et al. Mar 2002 B1
6366921 Hansen et al. Apr 2002 B1
6366934 Cheng et al. Apr 2002 B1
6370537 Gilbert et al. Apr 2002 B1
6370548 Bauer et al. Apr 2002 B1
6389427 Faulkner May 2002 B1
6389433 Bolosky et al. May 2002 B1
6393435 Gartner et al. May 2002 B1
6397231 Salisbury et al. May 2002 B1
6418448 Sarkar Jul 2002 B1
6421658 Carey et al. Jul 2002 B1
6421692 Milne et al. Jul 2002 B1
6427123 Sedlar Jul 2002 B1
6438540 Nasr et al. Aug 2002 B2
6438550 Doyle et al. Aug 2002 B1
6438562 Gupta et al. Aug 2002 B1
6442548 Balabine et al. Aug 2002 B1
6446091 Noren et al. Sep 2002 B1
6449620 Draper et al. Sep 2002 B1
6470344 Kothuri et al. Oct 2002 B1
6487546 Witkowski Nov 2002 B1
6496842 Lyness Dec 2002 B1
6519597 Cheng et al. Feb 2003 B1
6526403 Lin et al. Feb 2003 B1
6529901 Chaudhuri et al. Mar 2003 B1
6539398 Hannan et al. Mar 2003 B1
6542898 Sullivan et al. Apr 2003 B1
6571231 Sedlar May 2003 B2
6574655 Libert et al. Jun 2003 B1
6584459 Chang et al. Jun 2003 B1
6594675 Schneider Jul 2003 B1
6598055 Keesey et al. Jul 2003 B1
6604100 Fernandez et al. Aug 2003 B1
6609121 Ambrosini et al. Aug 2003 B1
6611843 Jacobs Aug 2003 B1
6615203 Lin et al. Sep 2003 B1
6636845 Chau et al. Oct 2003 B2
6643633 Chau et al. Nov 2003 B2
6662342 Marcy Dec 2003 B1
6675230 Lewallen Jan 2004 B1
6681221 Jacobs Jan 2004 B1
6684227 Duxbury Jan 2004 B2
6697805 Choquier et al. Feb 2004 B1
6704739 Craft et al. Mar 2004 B2
6704747 Fong Mar 2004 B1
6708186 Claborn et al. Mar 2004 B1
6718322 Brye Apr 2004 B1
6721723 Gibson et al. Apr 2004 B1
6725212 Couch et al. Apr 2004 B2
6732222 Garritsen et al. May 2004 B1
6754661 Hallin et al. Jun 2004 B1
6772350 Belani et al. Aug 2004 B1
6778977 Avadhanam et al. Aug 2004 B1
6785673 Fernandez et al. Aug 2004 B1
6795821 Yu Sep 2004 B2
6801224 Lewallen Oct 2004 B1
6826568 Bernstein et al. Nov 2004 B2
6826727 Mohr et al. Nov 2004 B1
6836778 Manikutty et al. Dec 2004 B2
6836857 Ten-Hove et al. Dec 2004 B2
6871204 Krishnaprasad et al. Mar 2005 B2
6901403 Bata et al. May 2005 B1
6915304 Krupa Jul 2005 B2
6920457 Pressmar Jul 2005 B2
6947927 Chaudhuri et al. Sep 2005 B2
6964025 Angiulo et al. Nov 2005 B2
7031956 Lee et al. Apr 2006 B1
7043487 Krishnamurthy et al. May 2006 B2
7043488 Bauer et al. May 2006 B1
7089239 Baer et al. Aug 2006 B1
7113936 Michel et al. Sep 2006 B1
7120645 Manikutty et al. Oct 2006 B2
7139746 Shin et al. Nov 2006 B2
7139749 Bossman et al. Nov 2006 B2
7162485 Gottlob et al. Jan 2007 B2
7171404 Lindblad et al. Jan 2007 B2
7171407 Barton et al. Jan 2007 B2
7174328 Stanoi et al. Feb 2007 B2
7194462 Riccardi et al. Mar 2007 B2
7216127 Auerbach May 2007 B2
7228312 Chaudhuri et al. Jun 2007 B2
7315852 Balmin et al. Jan 2008 B2
7386568 Warner et al. Jun 2008 B2
7433885 Jones Oct 2008 B2
7801857 Betts et al. Sep 2010 B2
7885880 Prasad et al. Feb 2011 B1
20010037345 Kiernan et al. Nov 2001 A1
20010049675 Mandler et al. Dec 2001 A1
20020015042 Robotham et al. Feb 2002 A1
20020035606 Kenton Mar 2002 A1
20020038358 Sweatt, III et al. Mar 2002 A1
20020056025 Qiu et al. May 2002 A1
20020073019 Deaton Jun 2002 A1
20020078068 Krishnaprasad et al. Jun 2002 A1
20020087596 Lewontin Jul 2002 A1
20020116371 Dodds et al. Aug 2002 A1
20020116457 Eshleman et al. Aug 2002 A1
20020120685 Srivastava et al. Aug 2002 A1
20020123993 Chau et al. Sep 2002 A1
20020124100 Adams Sep 2002 A1
20020133484 Chau et al. Sep 2002 A1
20020143512 Shamoto et al. Oct 2002 A1
20020152267 Lennon Oct 2002 A1
20020156772 Chau et al. Oct 2002 A1
20020156811 Krupa Oct 2002 A1
20020167788 Vibet et al. Nov 2002 A1
20020184188 Mandyam et al. Dec 2002 A1
20020184401 Kadel, Jr. et al. Dec 2002 A1
20020188613 Chakraborty et al. Dec 2002 A1
20020194157 Zait et al. Dec 2002 A1
20020198874 Nasr et al. Dec 2002 A1
20030004937 Salmenkaita et al. Jan 2003 A1
20030009361 Hancock et al. Jan 2003 A1
20030014397 Chau et al. Jan 2003 A1
20030065659 Agarwal et al. Apr 2003 A1
20030069881 Huttunen Apr 2003 A1
20030078906 Ten-Hove et al. Apr 2003 A1
20030084056 DeAnna et al. May 2003 A1
20030093672 Cichowlas May 2003 A1
20030101194 Rys et al. May 2003 A1
20030105732 Kagalwala et al. Jun 2003 A1
20030131051 Lection et al. Jul 2003 A1
20030140308 Murthy et al. Jul 2003 A1
20030158897 Ben-Natan Aug 2003 A1
20030163519 Kegel et al. Aug 2003 A1
20030167277 Hejlsberg et al. Sep 2003 A1
20030172135 Bobick et al. Sep 2003 A1
20030177341 Devillers Sep 2003 A1
20030182276 Bossman et al. Sep 2003 A1
20030182624 Large Sep 2003 A1
20030212662 Shin et al. Nov 2003 A1
20030212664 Breining et al. Nov 2003 A1
20030226111 Wirts et al. Dec 2003 A1
20040010752 Chan et al. Jan 2004 A1
20040043758 Sorvari et al. Mar 2004 A1
20040044659 Judd et al. Mar 2004 A1
20040064466 Manikutty et al. Apr 2004 A1
20040083209 Shin Apr 2004 A1
20040088320 Perry May 2004 A1
20040088415 Chandrasekar et al. May 2004 A1
20040103105 Lindblad et al. May 2004 A1
20040103282 Meier et al. May 2004 A1
20040128296 Krishnamurthy et al. Jul 2004 A1
20040143581 Bohannon et al. Jul 2004 A1
20040148278 Milo et al. Jul 2004 A1
20040163041 Engel Aug 2004 A1
20040167864 Wang et al. Aug 2004 A1
20040176958 Salmenkaita et al. Sep 2004 A1
20040177080 Doise et al. Sep 2004 A1
20040205551 Santos Oct 2004 A1
20040210573 Abe et al. Oct 2004 A1
20040215626 Colossi et al. Oct 2004 A1
20040220912 Manikutty et al. Nov 2004 A1
20040220927 Murthy et al. Nov 2004 A1
20040220946 Krishnaprasad et al. Nov 2004 A1
20040225680 Cameron et al. Nov 2004 A1
20040230667 Wookey Nov 2004 A1
20040230893 Elza et al. Nov 2004 A1
20040236762 Chaudhuri et al. Nov 2004 A1
20040255046 Ringseth et al. Dec 2004 A1
20040260683 Chan et al. Dec 2004 A1
20040267760 Brundage et al. Dec 2004 A1
20040268244 Levanoni et al. Dec 2004 A1
20050004907 Bruno et al. Jan 2005 A1
20050010896 Meliksetian et al. Jan 2005 A1
20050027701 Zane et al. Feb 2005 A1
20050038688 Collins et al. Feb 2005 A1
20050050016 Stanoi et al. Mar 2005 A1
20050050058 Jain et al. Mar 2005 A1
20050050092 Jain et al. Mar 2005 A1
20050065949 Warner et al. Mar 2005 A1
20050091188 Pal et al. Apr 2005 A1
20050097084 Balmin et al. May 2005 A1
20050120001 Yagoub et al. Jun 2005 A1
20050120031 Ishii Jun 2005 A1
20050138047 Liu et al. Jun 2005 A1
20050203933 Chaudhuri et al. Sep 2005 A1
20050228786 Murthy et al. Oct 2005 A1
20050228791 Thusoo et al. Oct 2005 A1
20050228792 Chandrasekaran et al. Oct 2005 A1
20050228818 Murthy et al. Oct 2005 A1
20050229158 Thusoo et al. Oct 2005 A1
20050235048 Costa-Requena et al. Oct 2005 A1
20050240624 Ge et al. Oct 2005 A1
20050257201 Rose et al. Nov 2005 A1
20050283614 Hardt Dec 2005 A1
20050289125 Liu et al. Dec 2005 A1
20050289175 Krishnaprasad et al. Dec 2005 A1
20060021246 Schulze et al. Feb 2006 A1
20060031204 Liu et al. Feb 2006 A1
20060031233 Liu et al. Feb 2006 A1
20060041537 Ahmed Feb 2006 A1
20060074901 Pirahesh et al. Apr 2006 A1
20060101073 Popa et al. May 2006 A1
20060129524 Levanoni et al. Jun 2006 A1
20060129584 Hoang et al. Jun 2006 A1
20060179068 Warner et al. Aug 2006 A1
20060212420 Murthy Sep 2006 A1
20060212491 Agrawal et al. Sep 2006 A1
20060224564 Yu et al. Oct 2006 A1
20060224627 Manikutty et al. Oct 2006 A1
20060235840 Manikutty et al. Oct 2006 A1
20070011167 Krishnaprasad et al. Jan 2007 A1
20070043696 Haas et al. Feb 2007 A1
20070271305 Chandrasekar et al. Nov 2007 A1
20080091623 Idicula et al. Apr 2008 A1
20080222087 Balmin et al. Sep 2008 A1
20100030726 Chandrasekar et al. Feb 2010 A1
20100036825 Chandrasekar Feb 2010 A1
Foreign Referenced Citations (11)
Number Date Country
856803 Aug 1998 EP
1 241589 Sep 2002 EP
WO 9746956 Dec 1997 WO
WO 0014632 Mar 2000 WO
WO 0049533 Aug 2000 WO
WO 0142881 Jun 2001 WO
WO 0159602 Aug 2001 WO
WO 0161566 Aug 2001 WO
WO 03027908 Apr 2003 WO
WO 03107576 Dec 2003 WO
WO 2006026534 Mar 2006 WO
Related Publications (1)
Number Date Country
20070083809 A1 Apr 2007 US