Claims
- 1. A method for generating a result for a query of a document of elements using pre-computed step queries and pre-computed step query results stored in a database, the method comprising:
receiving the query, wherein the query comprises a path of elements in the document of elements; reducing the query into a plurality of step queries, wherein a step query comprises a relationship between a plurality of elements determined from a part of the path of elements; for each of the plurality of step queries, retrieving a pre-computed step query result for a step query in the plurality of step queries by querying the database using the step query, wherein the step query corresponds to a pre-computed step query for the pre-computed step query result; and generating the result for the query using the step query results.
- 2. The method of claim 1, wherein generating the result comprises taking the intersection of the step query results.
- 3. The method of claim 1, wherein the result of the query comprises a location in the document of elements that includes the path of elements for the query.
- 4. The method of claim 1, wherein the result of the query comprises the path of elements for the query.
- 5. The method of claim 1, further comprising optimizing the query, wherein optimizing the query comprises generating sequences from the path of elements that interpolate the path.
- 6. The method of claim 1, wherein the plurality of step queries comprise at least one of a one-step query, two-step query, three-step query, and four-step query.
- 7. The method of claim 1, wherein reducing the query into the plurality of step queries comprises reducing the query into at least one two-step query.
- 8. The method of claim 1, wherein reducing the query into the plurality of step queries comprises reducing the query into at least one three-step query.
- 9. The method of claim 1, further comprising
computing a hash key for queries in the pre-computed step queries and plurality of step queries; and storing the hash keys for the pre-computed step queries and the corresponding pre-computed step query results in the database.
- 10. The method of claim 9, wherein retrieving the pre-computed step query result comprises using the stored hash keys for the step queries to retrieve the pre-computed step query results corresponding to the hash keys.
- 11. The method of claim 9, wherein the step query results comprise a ID for one or more elements in the document of elements.
- 12. The method of claim 9, further comprising post-processing the intersection of the step query results to generate the result for the query.
- 13. The method of claim 12, wherein post-processing the result comprises matching each step query in the step query results to the query.
- 14. The method of claim 9, wherein the relationship between the plurality of elements comprises a parent/child relationship.
- 15. The method of claim 9, wherein the document of elements comprise an XML document.
- 16. The method of claim 9, wherein elements in the document of elements comprise at least one of element, word, attribute, and string elements.
- 17. A method for creating a database of step queries and step query results for a document of elements, the method comprising:
determining relationships between a plurality of elements from the document of elements; generating step queries from the relationships; generating step query results for the step queries, wherein a step query result for a step query corresponds to one or more elements in the document of elements for the step query; and storing the step queries and corresponding step query results in the database, wherein the stored step query results are usable to generate a result for a main query, wherein the main query can be reduced to a plurality of step queries that correspond to the stored step queries.
- 18. The method of claim 17, further comprising generating an index for the step queries, the index pointing to the corresponding step query results for each step query.
- 19. The method of claim 17, wherein the step query results comprise a ID for one or more elements in the document of elements.
- 20. The method of claim 17, wherein the plurality of step queries and corresponding step query results are stored in a PostingList.
- 21. The method of claim 17, wherein the step queries comprise at least one of a one step query, two step query, three step query, and four step query.
- 22. The method of claim 17, wherein the document of elements comprise an XML document.
- 23. The method of claim 17, wherein elements in the document of elements comprise at least one of element, word, attribute, and string elements.
- 24. The method of claim 17, wherein generating step queries comprises reducing a step query to a canonical form.
- 25. The method of claim 17, wherein the relationship between the plurality of elements comprises a parent/child relationship.
- 26. The method of claim 17, wherein storing the step queries comprises:
generating a hash key for every step query; and storing the hash key for the step queries in the database.
- 27. A query processor for processing a query for a document of elements, the processor comprising:
a document processor configured to receive the document of elements and pre-compute a plurality of step queries and corresponding step query results from the document of elements; a database for storing the pre-computed plurality of step queries and corresponding step query results; and a query processor configured to receive the query, generate a plurality of step queries from the query, and generate a result for the query using the step query results retrieved from the database that correspond to the plurality of step queries.
- 28. The query processor of claim 27, wherein the document processor comprises a step query generator configured to generate the pre-computed plurality of step queries.
- 29. The query processor of claim 27, wherein the document processor comprises a canonicalizer configured to generate a canonical form of the pre-computed plurality of step queries.
- 30. The query processor of claim 27, wherein the document processor comprises a hash key generator configured to generate a hash key for each of the pre-computed plurality of step queries.
- 31. The query processor of claim 27, wherein the document processor comprises a step query result generator configured to generate step query results for the pre-computed plurality of step queries.
- 32. The query processor of claim 27, wherein the step query results comprise one or more identifiers corresponding to one or more elements in the document of elements.
- 33. The query processor of claim 27, wherein the query processor comprises an optimizer configured to optimize the query.
- 34. The query processor of claim 27, wherein the query processor comprises a step query generator configured to generate a plurality of step queries from the query.
- 35. The query processor of claim 27, wherein the query processor comprises a composer to retrieve the step query results retrieved from the database that correspond to the plurality of step queries.
- 36. The query processor of claim 27, wherein the query processor comprises an intersector configured to take the intersection of step query results retrieved from the database that correspond to the plurality of step queries.
- 37. The query process of claim 27, wherein the document of elements comprises an XML document.
- 38. A method for processing queries for a document of elements, the document including a plurality of subsections, each subsection including at least a portion of elements in the document, the method comprising:
receiving a query for a path of elements in the document of elements; determining a plurality of step queries from the query, each step query including at least a part of the path of elements; for each step query in the plurality of step queries, determining one or more subsections that include elements that correspond to a step query; and determining at least one subsection that includes the path of elements of the query.
- 39. The method of claim 38, further comprising generating a result for the query using the at least one subsection.
- 40. The method of claim 39, wherein the result comprises a location where the path of elements is stored.
- 41. The method of claim 39, wherein the result comprises the path of elements.
- 42. The method of claim 38, wherein determining one or more subsections comprises determining a subsection identifier for each of the one or more subsection.
- 43. The method of claim 38, further comprising:
determining a relevance score for each of the one or more determined subsections; and using the relevance scores for the one or more determined subsections to determine the at least one subsection that includes the path of elements.
- 44. The method of claim 38, further comprising verifying that the at least one subsection that includes the path of elements actually includes the path of elements using the query.
- 45. The method of claim 38, further comprising:
determining, for each of the one or more determined subsections, how many times an instance of a step query appears in a subsection; and using the frequency to determine the at least one subsection that includes the path of elements.
CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 60/389,066, filed Jun. 13, 2002, entitled “PARENT-CHILD QUERY INDEXING FOR XML DATABASES,” which disclosure is incorporated herein by reference for all purposes. The present disclosure is related to the following commonly assigned co pending U.S. Patent Applications:
[0002] No. ______ (Attorney Docket No. 021512 0001 10US, filed on the same date as the present application, entitled “A SUBTREE STRUCTURED XML DATABASE” (hereinafter “Lindblad I-A”);
[0003] No. ______ (Attorney Docket No. 021512 000310US, filed on the same date as the present application, entitled “XML DB TRANSACTIONAL UPDATE SYSTEM” (hereinafter “Lindblad III-A”); and
[0004] No. ______ (Attorney Docket No. 021512 000410US, filed on the same date as the present application, entitled “XML DATABASE MIXED STRUCTURAL-TEXTUAL CLASSIFICATION SYSTEM” (hereinafter “Lindblad IV-A”);
[0005] The respective disclosures of these applications are incorporated herein by reference for all purposes.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60389066 |
Jun 2002 |
US |