Query Optimizer

Description

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1, previously discussed, is a block diagram that illustrates row- and column-store layouts for a database.

FIG. 2, previously discussed, is a graph that illustrates query nesting FIG. 3, previously discussed, depicts a left-deep join tree.

FIG. 4, previously discussed, depicts a bushy join tree.

FIG. 5 is a block diagram of one type of computer system that can be used to implement the present invention's teachings.

FIG. 6 is a block diagram of another type of computer system that can be used to implement the present invention's teachings.

FIG. 7 is a block diagram of the high-level steps that the illustrated embodiment of the invention performs.

FIG. 8 is a diagram of a typical resultant join tree.

FIG. 9 depicts a sample database schema used to illustrate how the illustrated embodiment operates.

FIG. 10 depicts an example query used to illustrate how the illustrated embodiment operates.

FIG. 11 is a diagram of the partitioning results produced by the illustrated embodiment in response to the example schema and query.

FIG. 12 sets forth pseudocode for one of the operations of FIG. 7.

FIG. 13 sets forth the dimension subqueries that the illustrated generates in response to the example schema and query.

FIG. 14 sets forth the anchor subquery that the illustrated embodiment generates in response to the example schema and query.

FIGS. 15A and 15B (together, “FIG. 15”) set forth pseudocode for one of the operations of FIG. 7.

FIG. 16 depicts example projections of the FIG. 9 schema's tables.

FIG. 17 is a diagram of the query that the illustrated embodiment determines for FIG. 13's first partition if FIG. 16's projections are available.

FIG. 18 is a diagram of a non-snowflake-forest schema.

FIG. 19 is a diagram that illustrates how to treat the schema of FIG. 18 as a snowflake-forest schema so as to employ the illustrated embodiment on it.

DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT

The present invention's teachings are applicable to a wide range of database systems implemented in essentially any type of computer system. An example is the computer system that FIG. 5 depicts. That drawing's computer system 26 includes a microprocessor 28. Data that the microprocessor 28 uses, as well as instructions that it follows in operating on those data, may reside in on-board cache memory or be received from further cache memory 30, possibly through the mediation of a cache controller 32. That controller can in turn receive such data and instructions from system read/write memory (“RAM”) 34 through a RAM controller 36 or from various peripheral devices through a system bus 38. Alternatively, the instructions may be obtained from read-only memory (“ROM”) 40, as may some permanent data.

The processor may be dedicated to DBMS operations, or it may additionally execute processes directed to other functions, and the memory space made available to DBMS operations may be “virtual” in the sense that it may actually be considerably larger than the RAM 34 provides. So the RAM's contents may be swapped to and from a system disk 42, which in any case may additionally be used instead of a read-only memory to store instructions and permanent data. The actual physical operations performed to access some of the most-recently visited parts of the process's address space often will actually be performed in the cache 30 or in a cache on board microprocessor 28 rather than in the RAM 34. Those caches would swap data and instructions with the RAM 34 just as the RAM 34 and system disk 42 do with each other.

In any event, the ROM 40 and/or disk 42 would usually provide persistent storage for the instructions that configure such a system to optimize query execution in the manner that will be explained below, but the system may instead or additionally receive them through a communications interface 44, which receives them from a remote server system. The electrical signals that typically carry such instructions are examples of the kinds of electromagnetic signals that can be used for that purpose. Others are radio waves, microwaves, and both visible and invisible light.

Of course, few computer systems that implement the present invention's teachings will be arranged in precisely the manner that FIG. 5 depicts. Moreover, it will be typical for the computer system actually to be implemented, as the example that FIG. 6 depicts is, in a plurality of networked computing nodes. In that system each of a plurality of networked computers 46 has direct access to a respective storage device 48 and communicates over a network through a network switch 50 with the other computers, through which it can indirectly obtain access to the data on those other computers' storage devices. To perform a query, a user employs terminal equipment 52 such as a keyboard and monitor coupled to one of the computers to communicate with a DBMS process that is executing on that computer. That process performs the query by accessing its own storage device 48 and/or those of other computers.

Independently of whether the host system is a single-processor system or a multi-node network, it will need to determine an execution plan to use in responding to queries, and it does so in a way that will now be explained by reference to an illustrative embodiment of the invention.

As will be explained below in more detail, the illustrated embodiment operates in three phases. In the first phase, which FIG. 7's block 54 represents, it rewrites the original (typically SQL) query into an “anchor query” (defined below) and a set of (again, typically SQL) “snowflake subqueries” (also defined below). Each of the snowflake subqueries contains some disjoint subset of the tables in the original query's FROM clause, as will be explained below by reference to an example.

A snowflake subquery is so named because the primary-to-foreign-key relationships of the tables it queries define a snowflake schema. A degenerate case of a snowflake subquery is a star subquery, whose tables form a star schema.

The anchor query joins the results produced by the snowflake subqueries into a result equivalent to that produced by the original query. Taken together, query rewriting 54 transforms the original SQL query into an equivalent SQL query that consists of the anchor query, with the snowflake subqueries as the nested inputs in its FROM clause.

In the second phase, which block 56 represents, the optimizer uses what in the illustrated embodiment is a lightweight plan-generation algorithm to produce a respective query subplan from each snowflake subquery. In the final phase 58, the optimizer generates the final logical plan for the anchor query by using any appropriate cost-based techniques (e.g., left-deep trees and dynamic programming) but treating each record set produced by a snowflake subquery's execution as a single table. The overall result is that the optimizer is not limited to considering only one shape of join tree. It may, for instance, consider a space that comprises plans that define self-similar left-deep trees such as the one that FIG. 8 depicts. That is, it may consider left-deep trees whose right branches can lead not only to tables but also to other left-deep trees.

To illustrate the FIG. 7 approach in more detail, we present in FIG. 9 an example schema to which we will apply the illustrated embodiment. In this schema, F₁, F₂, and F₃denote fact tables. For a retail chain, these tables may include tables of customer purchases (Purchases), supplier orders (Inventory), and item discounts (Specials). D₁, . . . , D₆are dimension tables. Again for a retail chain, these tables may include tables of customers (Customer), suppliers (Supplier), stores (Stores), items (Items), dates (Date), and states (State). Each table in the schema is shown with its attributes. Every fact table includes one or more attributes of the form FKi, which is a foreign key that refers to the primary key, PKi, of dimension table D_i.

As background, we will define a useful normal form for predicates. Attribute A of relation R is named by writing R.A. Relational operators include <, >, <=, >=, and =, and we denote them as <relop>. A predicate of the form <attribute-name><relop><attribute-name> or <attribute-name><relop><constant> is called a simple predicate. Simple predicates can be combined with Boolean operators (AND, OR, NOT) to form a complex predicate. NOT may appear only directly in front of an simple predicate (as in “NOT (R.A=6)”). Conjunctive Normal Form (“CNF”) is a complex predicate that is written as a conjunction of clauses in which each clause is a disjunction of simple predicates. For example, if A, B, C, D, and E are all simple predicates, then (A OR B) AND (NOT(B) OR C OR NOT(D)) AND (D OR NOT(E)) is in CNF. It can be shown that every complex predicate can be converted to CNF.

FIG. 10 depicts a typical data-warehouse query Q over the schema of FIG. 9. For present purposes, the important features to identify in any SQL query such as this are the following:

1) Join Tables: These are the tables that appear in the FROM clause of the SQL query. For Q, these are tables F₁, F₂, F₃, D₁, D₂, D₃, D₄, D₅, and D₆.

2) Predicates: These are the conjoined predicates in the WHERE clause. The predicates are of three types:

- Join predicates, which equate a fact-table foreign key with the associated dimension-table primary key. For query Q of FIG. 9, these are the predicates: FK1=PK1, FK2=PK2, FK3=PK3, FK4=PK4, FK5=PK5, and FK6=PK6.
- Restriction predicates over attributes of the dimension tables. For query Q of FIG. 9, the restriction predicates are G=10, H>10, J=0 and K=1.
- Crossfact predicates, which are predicates that compare fact-table attributes. For query Q of FIG. 9, the crossfact predicates are A1=B1 and B1=C1.

FIG. 11 is a graph-based representation of FIG. 10's query Q: the graph's nodes represent Q's nine join tables, and there is an edge from table T to table U if T contains a foreign key that one of Q's join predicates compares with U's primary key. Note that F₁, D₁, and D₂form a very simple “star schema” (a degenerate case of a snowflake schema). Similarly, F₂, D₃, D₄, and D₅form a star schema, as do F₃and D₆. We refer to the entire schema, consisting of unconnected snowflake-schema trees, as a snowflake forest.

C. Algorithm Steps
Step 1: Query Rewriting

The illustrated optimization algorithm's query-rewriting step represented by FIG. 7's block 54 takes a FIG. 11-type snowflake-forest query as input. The output of this step is a nested SQL query that is equivalent to Q. FIG. 12 depicts in pseudocode the algorithm (hereafter referred to as “Algorithm Rewrite”) that the illustrated embodiment uses to perform FIG. 7's query-rewriting operation 54.

The FIG. 12 algorithm's first step, which block 60 represents, partitions the snowflake-forest query graph into its snowflake-query components. When that step is applied to the query of FIG. 10, it results in the following three partitions, which reflect the organization described above by reference to FIG. 11:

Partition 1={F₁, D₁, D₂}

Partition 2={F₂, D₃, D₄, D₅}

Partition 3={F₃, D₆}

Note that this operation requires that the optimizer recognize primary-key/foreign-key relationships. This is possible because the database-designing user ordinarily declares those relationships in order to enable the DBMS to enforce referential integrity. If the designer fails to declare some such relationship, the illustrated optimizer will still work, although not as effectively. If the relationship depicted in FIG. 9 between FK1 in F₁and PK1 in D₁had not been declared, for example, the optimizer would place F₁and D₁in separate partitions rather than in the same one, but, as will become apparent as the description proceeds, a valid search strategy would nonetheless result.

As FIG. 12's block 62 indicates, each partition is then used to create a snowflake subquery. When this step is applied to the FIG. 10 query, the result is FIG. 13's three snowflake subqueries Q₁, Q₂, and Q₃.

Finally, the anchor query that connects the snowflake subqueries is constructed, as FIG. 12's block 64 indicates. In the example, the resultant anchor query is the one that FIG. 14 depicts.

Step 2: Snowflake Subquery Plan Generation

Having thus completed the query-rewriting operation that FIG. 7's block 54 represents the optimizer enters its second, block-56 phase, in which it uses a “lightweight” algorithm to generate a plan for each snowflake subquery. (This algorithm is lighter in weight than, e.g., state-of-the-art join enumeration because it applies a greedy heuristic to choose a join order, with the result that the amount of time required to make the choice increases only linearly with the number of projections considered for the query, and that number is less than or equal to the number of materialized views actually available.)

To explain this algorithm we first assume that its snowflake-subquery inputs are all simple star subqueries, as the example subqueries Q₁, Q₂, and Q₃all are. We will thereafter explain how the algorithm as thus described is extended to handle snowflake subqueries that are not necessarily star subqueries.

We also digress briefly to note that there is a large volume of work (e.g., Baralis, Paraboschi, Teniente, “Materialized View Selection in a Multidimensional Database,” in Proceedings of the 23^rdVLDB Conference, Athens, Greece, 1997, and H. Gupta and I. S. Mumick, “Selection of Views to Materialize under a Maintenance-Time Constraint,” International Conference on Database Theory (ICDT), 1999) that investigates how to decide which aggregate views should be materialized and therefore which views to use in processing a query.

The snowflake-subquery plan-generator algorithm (hereafter called “Algorithm Snowflake”) that will now be described by reference to the pseudocode of FIG. 15 takes advantage of certain materialized views. Specifically, it considers takes into account whatever projections the database's physical layout provides. Of course, it sometimes happens that there is only one projection per table (i.e., the table itself). But the algorithm takes advantage of others if they exist.

For any set of tables Ts used in a query Q, a projection P is said to cover Ts if P includes a respective column for every attribute that appears in Q and belongs to a table in Ts. But P need not include the attributes that occur only in Q's join predicates that join tables in Ts. To illustrate, consider the projections shown in FIG. 16, which include columns from FIG. 9's tables F₁, D₁, and D₂. Observe that:

- PF1_acovers {F₁} with respect to FIG. 10's query Q, because it includes every column of F₁that appears in Q.
- PF1_bcovers {F₁, D₁} with respect to FIG. 10's query Q, because it includes every column of F₁and D₁that appears in Q except those (FK1 and PK1) that appear in the join predicate between F₁and D₁.
- PF1_ccovers {F₁, D₂} with respect to FIG. 10's query Q, because it includes every column of F₁and D₂that appears in Q, except those (FK2 and PK2) that appear in the join predicate between F₁and D₂.
- PD1_acovers {D₁} with respect to FIG. 10's query Q because it includes every column of D₁that appears in Q.
- PD2_aand PD2_bboth cover {D₂} with respect to FIG. 10's query Q, because both include every column of D₂that appears in Q.

An Example Cost Model: FIG. 15's snowflake-subquery plan-generation algorithm assumes the existence not only of some set of projections but also of a cost model that is used to choose among those projections. Typically, a cost model includes a set of formulas that can be used to map any query plan to its expected cost. By applying these formulas to all candidate plans, the plan can be ranked from least to most expensive, and plan choices can be made accordingly. To keep the explanation simple, we present the cost model below in terms of the decision procedure it implies rather than as a set of formulas.

To understand this cost model, we need to introduce one additional concept. Predicates in a SQL WHERE clause are applied to a set of records. Those records for which the predicate is true are returned as the result set R. A predicate that returns very few records is said to be highly selective. One measure of the selectivity of a predicate applied to a set of records S is what we refer to as a “selection coefficient,” |R|/|S|: the lower the selection coefficient, the greater the selectivity is.

We use the simple cost model below to trace the steps of the snowflake-subquery plan-generation algorithm on an example query. It is important to note that this algorithm's applicability does not depend on the choice of actual cost model; the model described here is meant only to serve as an example.

1. Ranking Candidate Projections for the Anchor Projection: For each subquery Q's plan, a projection is chosen as the “anchor” projection, whose purpose will become apparent below. FIG. 15's block 66 represents making that choice. If only one existing projection by itself covers the fact table F in the snowflake defined by the subquery Q whose subplan is being determined, that projection is the one chosen as that subplan's anchor projection. If there is more than one such projection, we choose one of them as the anchor projection according to the following rules:

- a. Choose the projection that covers the largest set {F, D₁, . . . , D_k} of join tables in subquery Q. In case of a tie, use rule b to break it:
- b. Choose the tying projection whose selectivity is greatest (as indicated by the smallest product of the selection coefficients of all restriction predicates over attributes in the tables covered by the projection). In case of a tie, use rule c to break it:
- c. Choose the tying projection for which the predicates over the attributes forming the projection's sort order are collectively the most selective. That is, for each projection P_i, compute the product of the selection coefficients of the predicates in Q that impose restrictions on any of the attributes in P_i's sort order, and choose the projection whose product is the lowest. In case of a tie, use rule d to break it:
- d. Choose any tying projection that is sorted on the attributes in Q's GROUP BY clause. If no such projection exists, choose any tying projection.
  
  2. Ranking Candidate Projection Sets for the Anchor Projection: If there is no existing projection that covers the fact table, then a set of existing projections must be chosen that, when joined, will form a projection that covers the fact table. If there is more than one candidate set, the cost model is used to choose among them. These are the example cost model's rules for choosing among them:
- a. Choose the projection set whose cardinality is lowest. In case of a tie, use rule b break it.
- b. For each tying candidate projection set, compare the covering projection that would result from joining the projections in that set, and choose the set whose result would be chosen according to the above-described rules (rules 1a-d) for selecting a single anchor projection.
  
  3. Ranking Candidate Projections to Cover a Dimension Table D: In an operation represented by FIG. 15's block 68, a projection is chosen for each of the subquery's dimension tables D that the subquery's anchor projection does not cover. If more than one projection covers dimension table D, the following rules are used to choose one of those projections as D's covering projection:
- a. Choose the projection with respect to whose sort-order attributes the query's restriction predicates are most selective. That is, for each projection P_i, compute the product of the selection coefficients of predicates in Q that restrict the values of any of the attributes in P_i's sort order, and choose the projection whose product is the lowest. In case of a tie, break it by using rule b.
- b. Choose the tying projection whose sort order is the primary key of D. If no such projection exists, choose any tying projection.
  
  4. Ranking Restriction Predicates: In an operation that FIG. 15's block 70 represents, a left-deep join tree is constructed for the subquery. The anchor projection is the tree's leftmost leaf. To place the dimension-table-covering-projections in the tree, restriction predicates over a single projection must be ranked. For this example cost model, is ranking will be based on the predicates' selectivities. That is, predicates will be placed from bottom to top of the query plan in ascending order of their selection coefficients.

Tracing Subplan Generation over an Example Subquery: To illustrate the FIG. 15 embodiment's application to a single one of the subqueries, we will trace it on FIG. 13's subquery Q₁, which was one of the subqueries produced by performing the FIG. 12 operation on FIG. 10's query Q. For this example, we assume that FIG. 16's projections are available and that the following statistics apply:

the selection coefficient of the “G=10” predicate is 0.1, and

the selection coefficient of the “H>10” predicate is 0.5.

In an operation that FIG. 15's block 66 represents, an anchor projection is chosen that covers fact table F₁. Since projections PF1_a, PF1_b, and PF1_call cover F₁, they are all candidate projections. But PF1_adoes not additionally cover any dimension table, whereas PF1_bcovers D₁, and PF1_ccovers D₂. So, according to the example cost model's rule 1a, PF1_ais eliminated from contention. This leaves PF1_b, and PF1_c, and, because of the cost model's rule 1b, PF1_bis chosen; the selection coefficient of D₁(which PF1_bcovers) is 0.1 (equal to the selection coefficient of “G=10”), whereas the selection coefficient of D₂(which PF1_ccovers) is 0.5 (equal to the selection coefficient of “H>10”).

In an operation that FIG. 15's block 68 represents, a projection is chosen to cover the remaining uncovered dimension table, D₂. There are two candidate projections; PD2_aand PD2_bboth cover D₂. But, whereas the selection coefficient of PD2_ais 1 (because Q has no restriction predicate over PD2_a's sort-order attribute, PK2), the selection coefficient of PD2_bis 0.5 (because the “H>10” restriction predicate restricts, with selection coefficient 0.5, the values of PD2_b's sort-order attribute H). So application of the above-described cost model's rule 3a results in choosing projection PD2_b.

In an operation that FIG. 15's block 70 represents, a left-deep join tree is constructed with anchor projection PF1_band dimension projection PD2_b.

In an operation that FIG. 15's block 72 represents, a selection operation for every restriction predicate in Q₁is placed just above the appropriate projection. Thus, “σ_G=10” is placed over PF1_b, and “σ_H>10” is placed over PD2_b. Since Q₁includes no GROUP BY or ORDER BY clause, the operations that blocks 74 and 76 represent result in no additions to the plan, and the algorithm terminates, leaving the query plan shown in FIG. 17.

Extensions of Algorithm Snowflake: The pseudocode shown in FIG. 15 assumes that the input is a star-schema query. To handle general snowflake queries, one first identifies the “snowflake degree” of every dimension table D in a query, where a table D's snowflake degree in a given query is defined as the length of the longest path that starts at the table-D-representing node in that query's graph. For each dimension table D whose snowflake degree is one, the FIG. 15 subplan-generation operation is performed by treating dimension table D as the fact table and treating as the dimension tables all tables reachable from that table. Then, for each dimension table D whose path length is two, the FIG. 15 subplan-generation operation is performed by treating dimension table D as the fact table and treating as the dimension tables (1) all snowflake-degree-zero tables directly reachable from table D and (2) all the record sets that will result from execution of the subplans previously determined for all other tables directly reachable from table D. This continues for increasingly high snowflake degrees until subplans have been determined for all dimension tables whose snowflake dimensions exceed zero. Then the FIG. 15 operation is executed a final time, with the real fact table F taken as the fact table, and, as the dimension tables (1) all snowflake-degree-zero tables directly reachable from table F and (2) all the record sets that will result from execution of the subplans previously determined for all other tables directly reachable from table F.

Another extension to Algorithm Snowflake allows for processing of an “almost snowflake” query, whose graph representation is a collection of directed acyclic graphs (“DAGs”) rather than a collection of trees. FIG. 18 is the graphical representation of one such query. That query is not a snowflake query, because dimension table D₁is shared by F₁and F₂, and dimension table D₆is shared by F₁and F₃. To handle such queries, a simple preprocessing step replicates all nodes that are reachable through multiple distinct paths. The result of such replication is to convert an “almost snowflake” query into a snowflake query. For example, the example graph of FIG. 18 would first be modified to that of FIG. 19, where shared dimension tables D₁and D₆have been replicated to be tables D′₁and D′₆, respectively, before the FIG. 15 operation is executed.

Step 3: Cost-Based Join Enumeration

The query-optimization algorithm's last step—i.e., the step that FIG. 7's block 58 represents—involves running some cost-based plan-generation algorithm on the anchor query Q_agenerated in the (block 54) query-rewrite phase. (For example, this step could, but need not, use the well-known approach of enumerating and costing left-deep trees by using dynamic programming.) What is important to note is that the table inputs to the cost-based optimization step are not the original tables of the input query; they are the partitions produced during the rewrite step of the optimization algorithm. Where each partition would appear in the final plan, the snowflake subplan for that partition that was produced in Step 2 of the algorithm is substituted.

As was mentioned above, a complete access plan specifies not only what logical operations are to be performed but also lower-level details such as what join algorithms to use. The system decides these details, too, and the resultant complete plan is executed to obtain the query results and generate output signals that represent those results.

Note that the complexity of cost-based plan generation has been reduced by using multi-table partitions, rather than individual tables, as the input to the plan generator. This means that instead of scaling to twelve to fifteen tables, the algorithm scales to twelve to fifteen partitions (potentially, twenty-five or tables or more, the number depending on how many tables fall in each partition). For example, whereas query Q of FIG. 10 has nine join tables, it has only three partitions, so in that example the complexity of cost-based join enumeration has been reduced by a factor of three. The present invention therefore constitutes a significant advance in the art.

Claims

1. For employing a computer system to execute a query on a database, a method comprising: A) providing inputs to the computer system that represent a query that, applied to the database, defines a plurality of independent snowflake schemas; andB) employing the computer system to: i) compute, separately for each of the snowflake schemas, an access plan for obtaining from that schema's tables a record set that includes the data requested by the query from those tables;ii) compute an access plan for obtaining the query's results from the record sets in which execution of the access plans thus computed will result;iii) obtain the query's results by executing the access plan; andiv) generate an output that represents the query results thus obtained.
2. A method as defined by claim 1 wherein, for at least one said independent snowflake schema, the access plan computed for that independent snowflake schema is computed by computing, separately for each table whose snowflake degree in that independent snowflake schema exceeds zero, a respective access plan for obtaining a record set that includes the data requested by the query from that table and the tables reachable therefrom, that access plan being computed by treating that table as the fact table of a star schema and treating as a dimension table of that star schema: A) any snowflake-degree-zero table directly reachable from that table; andB) any record set that will result from execution of the access plan determined for each table of snowflake degree greater than zero that is directly reachable from that table.
3. A method as defined in claim 1 wherein the database is a column store.
4. For executing a queries on a database, a computer system configured to: A) maintain a database;B) accept input signals that represent a query that, applied to the database, defines a plurality of independent snowflake schemas; andC) in response thereto: i) compute, separately for each of the snowflake schemas, an access plan for obtaining from that schema's tables a record set that includes the data requested by the query from those tables;ii) compute an access plan for obtaining the query's results from the record sets in which execution of the access plans thus computed will result;iii) obtain the query's results by executing the access plan; andiv) generate an output that represents the query results thus obtained.
5. A computer system as defined in claim 4 wherein, for at least one said independent snowflake schema, the access plan computed for that independent snowflake schema is computed by computing, separately for each table whose snowflake degree in that independent snowflake schema exceeds zero, a respective access plan for obtaining a record set that includes the data requested by the query from that table and the tables reachable therefrom, that access plan being computed by treating that table as the fact table of a star schema and treating as a dimension table of that star schema: A) any snowflake-degree-zero table directly reachable from that table; andB) any record set that will result from execution of the access plan determined for each table of snowflake degree greater than zero that is directly reachable from that table.
6. A computer system as defined in claim 4 wherein the database is a column store.
7. A storage medium containing machine instruction readable by a computer system to configure the computer system to: A) maintain a database;B) accept input signals that represent a query that, applied to the database, defines a plurality of independent snowflake schemas; andC) in response thereto: i) compute, separately for each of the snowflake schemas, an access plan for obtaining from that schema's tables a record set that includes the data requested by the query from those tables;ii) compute an access plan for obtaining the query's results from the record sets in which execution of the access plans thus computed will result;iii) obtain the query's results by executing the access plan; andiv) generate an output that represents the query results thus obtained.
8. A storage medium as defined in claim 7 wherein, for at least one said independent snowflake schema, the access plan computed for that independent snowflake schema is computed by computing, separately for each table whose snowflake degree in that independent snowflake schema exceeds zero, a respective access plan for obtaining a record set that includes the data requested by the query from that table and the tables reachable therefrom, that access plan being computed by treating that table as the fact table of a star schema and treating as a dimension table of that star schema: A) any snowflake-degree-zero table directly reachable from that table; andB) any record set that will result from execution of the access plan determined for each table of snowflake degree greater than zero that is directly reachable from that table.
9. A storage medium as defined in claim 7 wherein the database is a column store.
10. For employing a computer system to execute a query on a database that contains a plurality of base tables that have primary-key/foreign-key relationships, a method comprising: A) providing input signals to the computer system that represent a query to be applied to the database; andB) employing the computer system to: i) choose a set of access plans in accordance with the primary-key/foreign-key relationships among tables in the query;ii) select an access plan from among the chosen access plans in accordance with predetermined selection criteria;iii) obtain the query's results by executing the access plan; andiv) generate an output indicative of the query results thus obtained.
11. For executing queries on a database, a computer system configured to: A) maintain a database that contains a plurality of tables that have primary-key/foreign-key relationships among them;B) accept input signals that represent a query; andC) in response thereto: i) choose a set of access plans in accordance with the primary-key/foreign-key relationships among tables in the query;ii) select an access plan from among the chosen access plans in accordance with predetermined selection criteria;iii) obtain the query's results by executing the access plan; andiv) generate an output indicative of the query results thus obtained.
12. A storage medium containing machine instruction readable by a computer system to configure the computer system to: A) maintain a database that contains a plurality of tables that have primary-key/foreign-key relationships among them;B) accept input signals that represent a query; andC) in response thereto: i) choose a set of access plans in accordance with the primary-key/foreign-key relationships among tables in the query;ii) select an access plan from among the chosen access plans in accordance with predetermined selection criteria;iii) obtain the query's results by executing the access plan; andiv) generate an output indicative of the query results thus obtained.

Query Optimizer

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims