Efficient search space analysis for join factorization

Information

  • Patent Application
  • 20070219977
  • Publication Number
    20070219977
  • Date Filed
    March 08, 2007
    18 years ago
  • Date Published
    September 20, 2007
    17 years ago
Abstract
Under a type of query transformation referred to herein as join factorization, the branches of an UNION/UNION ALL query that join a common table are combined to reduce accesses to the common table. The transformation can be expressed as (T1 join T2) union all (T1 join T3)=T1 join (T2 union all T3), where T1, T2 and T3 are three tables. A given query may be rewritten in many alternate ways using join factorization. Evaluating each alternative can be expensive. Therefore, the alternatives are generated and evaluated in a way that minimizes the cost of evaluating the alternatives.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:



FIG. 1 is a diagram of a query optimizer according to an embodiment of the present invention.



FIG. 2 is a flow chart depicting a procedure for generating join factorization units according to an embodiment of the present invention.



FIG. 3A is a flow chart depicting a procedure for search space analysis according to an embodiment of the present invention.



FIG. 3B is a flow chart depicting a procedure for search space analysis according to an embodiment of the present invention.



FIG. 4 is a diagram of computer system that may be used in an implementation of an embodiment of the present invention.


Claims
  • 1. A computer implemented method, comprising generating a plurality of units that each correspond to a set of base branches of a plurality of base branches in a base query;wherein each unit of said plurality of units represents a factorization of a common table set involving a common table joined in each branch of the respective set of base branches;generating a certain plurality of states that conform to one or more criteria, wherein each state of said certain plurality of states corresponds toa combination of one or more units of said plurality of units, anda query transformation according to the one or more factorizations represented by the combination of one or more units;generating costs for at least a subset of states of said certain plurality of states; andmaking a comparison of the costs of the subset of states to select a certain state of said certain plurality of states.
  • 2. The method of claim 1, wherein said one or more criteria include that a state is not formed by combining two or more units, wherein a unit of said two or more units is associated with a state that has a cost that exceeds the cost of another state of said certain plurality of states.
  • 3. The method of claim 1, wherein said one or more criteria include that a state does not comprise units that share a base branch of said plurality of base branches.
  • 4. The method of claim 1, wherein said one or more criteria include that a state does not comprise units that contain an identical common table set.
  • 5. The method of claim 1, wherein said one or more criteria include that a state not include a unit of said plurality of units that requires a Cartesian product to compute a join of the tables in the respective common table set.
  • 6. The method of claim 1, wherein said one or more criteria include that a state not include a particular unit of said plurality of units that requires a Cartesian product that would not otherwise be required without factorizing the respective common table set of said particular unit.
  • 7. The method of claim 1, wherein the step of generating costs includes: generating a cost for a state of said certain plurality of states, wherein computing a cost for a state includes computing a first cost for a query block corresponding to a unit of said plurality of units included in the state; andgenerating a cost for a second state of said certain plurality of states by adding a particular cost to the already computed first cost.
  • 8. The method of claim 1, wherein generating a certain plurality of states includes generating a first plurality of first states comprising only one unit of said plurality of units; andwherein the steps further include discarding one or more units of a first state of said plurality of states that has a cost greater than a cost associated with said base query.
  • 9. The method of claim 8, where the steps include performing after discarding one or more units: establishing a first state of said first plurality of first states as a lowest cost state;generating a second plurality of second states comprising at least two units that have not been discarded of said plurality of units;making a comparison of a cost of a second state of said second plurality of second states to a cost of the lowest cost state; andbased on the comparison, determining whether to establish the second state as the lowest cost state.
  • 10. The method of claim 1, wherein generating a certain plurality of states includes: establishing a particular state of said certain plurality of states as a lowest cost state;making a comparison of the cost of another state of said certain plurality of states to the lowest cost state; andbased on the comparison, discarding said another state from the certain plurality of states.
  • 11. A computer implemented method, comprising generating a plurality of units that each correspond to a set of base branches of a plurality of base branches in a base query;wherein each unit of said plurality of units represents a factorization of a common table set comprising a common table joined by each of the respective set of base branches;generating costs for combinations of the one or more units of said plurality of units; anddetermining how to transform the based query based on a comparison of the costs.
  • 12. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 1.
  • 13. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 2.
  • 14. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 3.
  • 15. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 4.
  • 16. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 5.
  • 17. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 6.
  • 18. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 7.
  • 19. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 8.
  • 20. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 9.
  • 21. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 10.
  • 22. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in claim 11.
Provisional Applications (1)
Number Date Country
60782785 Mar 2006 US