Query processing with multiple distinct aggregates, cube, rollup, and grouping sets can include maintaining separate streams of groupings and then performing operations, such as group by, join, etc., on the streams to generate query results. The number of streams and the number of group by or join operations are proportional, for example, to the number of distinct aggregates in the query. This results in increased memory usage and thus increased expense as the number of distinct operations to respond to a query increases. This can further result in a distributed deadlock.
Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to examples. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
Throughout the present disclosure, the terms “a” and “an” are intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
A query processing apparatus and method are described herein and provide for efficient answering of structured query language (SQL) queries with multiple distinct aggregates (IVIDAs), and SQL queries with cube, rollup or grouping sets operations. Generally, the query processing apparatus includes a query input module to receive a query. A query determination module determines whether the SQL query includes MDAs, or cube, rollup or grouping sets operations. Based on the determination, an intermediate processing module processes the From and Where clauses of the query and forwards the results to a group generator module. The group generator module generates groupings as an output specific to the query type. For a SQL query including MDAs, a SQL-MDA group by processing module performs two successive group by operations on the output of the group generator module to answer the query. For a SQL query including cube, rollup or grouping sets operations, a SQL cube, rollup and grouping sets group by processing module performs a single group by operation on the output of the group generator module to answer the query.
The apparatus and method provide for the processing of intermediate results (or the results of intermediate groupings) that are generated during processing of SQL queries with MDAs, or cube, rollup or grouping sets operations. The number of group by operations that are needed to answer the foregoing query types are also bound. Furthermore, the intermediate results (i.e., the output of the group generator module) are maintained in a single stream, which eliminates the possibility of distributed deadlock.
For a SQL query including MDAs, the number of group by operations used is two. Thus, for a SQL query including MDAs, the number of group by operations is independent of the number of distinct aggregates in the query and remains at two. For a SQL query including cube, rollup and grouping sets operations, the number of group by operations used is one. Since the number of group by operations are limited, the resources needed are also limited, and are therefore unrelated, for example, to the number of MDAs, or the number of keys in cube or rollup operations, or the number of sets in the grouping sets operation.
The modules 102, 106, 108, 110, 116, 118 and 122, and other components of the apparatus 100 may comprise machine readable instructions stored on a computer readable medium. In addition, or alternatively, the modules 102, 106, 108, 110, 116, 118 and 122, and other components of the apparatus 100 may comprise hardware or a combination of machine readable instructions and hardware.
Generally, the group generator module 110 generates groupings used to answer the query 104 based on whether the SQL query includes MDAs, or cube, rollup or grouping sets operations. An example of a SQL query with MDAs for a Table 1 (i.e., Table foo) is as follows:
A SQL query with MDAs (e.g., two distinct aggregations for the following example) may include: select a, sum(distinct b), count(distinct c) from foo group by a. A SQL query including cube, rollup and grouping sets operations allows for the performance of multi-level aggregations in a single query. An example of a SQL query including a cube operation for Table 1 may include: select a, b, sum(b) from foo group by cube(a, b). An example of a SQL query including a rollup operation for Table 1 may include: select a, b, sum(b) from foo group by rollup (a, b). Similarly, an example of a SQL query including a grouping sets operation for Table may include: select a, b, sum(b) from foo group by grouping sets (a, (b,c)).
Generally, SQL queries including MDAs or cube, rollup or grouping sets operations use multiple groups that are formed and processed. For the foregoing example of a SQL query including MDAs, the query returns a sum of all unique values of b, and a count of all unique values of c, for each unique value of a. This operation is based on a determination of all unique values of b and c for each unique value of a, which uses the groupings (a, b} and (a, c}. For the foregoing example of a SQL query including a cube operation, the operation uses the groupings { }, {a}, {b}, {a, b}. For the foregoing example of a SQL query including a rollup operation, the operation uses the groupings { }, {a}, {a,b}. Further, for the foregoing example of a SQL query including a grouping sets operation, the operation uses the groupings {a}, {b,c}. As described in detail below, the query processing apparatus and method provide for the generation, propagation and processing of these different groupings.
With regard to the processing of the foregoing SQL query with MDAs, the modules 110 and 116 apply the transformation shown below:
The group generator module 110 thus implements the innermost subquery (i.e., select a, b. null as c, 0 as grouping id from foo union all select a, null, c, 1 from foo) in the foregoing transformation. The two outer select blocks (i.e., select a, b, c, grouping_id from (. . . ) group by a, b, c, grouping_id and select a, sum(b), count(c) from ( . . . ) group by a) present the two group by operations that are applied over the output 112 of the group generator module 110 by the SQL-MDA group by processing module 116.
For a SQL query with MDAs generally, the specification of the group generator module 110 and the two group by operations that are applied to the output 112 of the group generator module 110 by the SQL-MDA group by processing module 116 are described.
With regard to the general specification of the group generator module 110 for handling SQL queries with MDAs, consider the following generalized SQL:
select g1, . . . , gm, agg(a1), . . . , agg(ak), agg(distinct d1), . . . , agg(distinct dn)
from < >
where < >
group by g1, . . . , gm
The foregoing SQL query with MDAs contains m (m≧0) grouping columns (g1, . . . , gm), k (k≧0) non-distinct aggregates, and n (n>1) distinct aggregates. Since operation of the group generator module 110 is independent of the contents of the foregoing From and Where clauses (i.e., from < >, where < >), no details are provided as to the contents of these clauses.
For the specification of the group generator module 110, let “foo” represent the input data stream (the result of processing the Where clause (i.e., where < >) by the intermediate processing module 108) to the group generator module 110. The group generator module 110 will output (m+n+k+1) columns at the output 112, as presented by the following SQL:
select g1, . . . , gm, a1, . . . , ak, d1, . . . , null, . . . , null, 0 as grouping_id from foo
union all
select g1, . . . , gm, null, . . . , null, null, . . . , di, . . . , null, i from foo
union all
select g1, . . . , gm, null, . . . , null, . . . , null, . . . , null, . . . , dn, n-1 from foo
For the foregoing generalized SQL query with MDAs, the two group by operations performed by the SQL-MDA group by processing module 116 are specified as follows. The inner group by operation may be specified as follows:
Group by:
Grouping columns: (g1, . . . , gm, grouping_id)
Aggregates: agg(a1), . . . , agg(ak)
The outer group by may be specified as follows:
Group by:
Grouping columns:
Aggregates: (agg(d1), . . . , agg(dn),agg_convert(a1), . . . , agg_convert(ak)
For the foregoing outer group, egg_convert is a converted aggregate as specified by the following two rules. For rule 1, if egg is “count” or “count” then agg_convert is “sum”. For rule 2, if egg is neither “count” nor “count(*)” then egg convert is the same as egg.
Based on the foregoing discussion related to operation of the intermediate processing module 108 and the group generator module 110 for generalized SQL queries with MDAs, a SQL query with MDAs is processed by first processing the From and the Where clauses of the query (i.e., the where < > clause in the foregoing example of generalized SQL queries with MDAs). The Where clause is processed by the intermediate processing module 108. The output of the intermediate processing module 108 is fed into the group generator module 110, which outputs (m+n+k+1) columns at the output 112. The output 112 of the group generator module 110 is fed to the SQL-MDA group by processing module 116, which performs two successive group by operations on the output 112 to answer a query.
An example of an operation of the group generator module 110 for a SQL queries with MDAs is described. The SQL query including MDAs relates to Table 2 (i.e., Table Orders) as follows, which also includes the data specified below:
For Table 2, the terms are specified as follows:
orderid=Order Identification (ID)
prodid=Product ID
dealerid=Dealer ID
amount=Amount
quantity=Quantity
For a SQL query including MDAs, the query may specify:
select prodid, sum(distinct amount) as sum amount, sum(distinct quantity) as
sum_quantity from orders group by prodid;
The output of the SQL query including MDAs is specified in Table 3:
For Table 3, the terms that are not previously defined are specified as follows:
sum_amount=Summation of Amount
sum_quantity=Summation of Quantity
I n order to transform the foregoing example of a SQL query including MDAs, the transformation shown below is applied by the modules 110 and 116:
The output of each of the blocks in the foregoing example of a SQL query including MDAs is given below in Tables 4-6.
For Table 4, the terms that are not previously defined are specified as follows:
grouping_id=Grouping ID
With regard to a SQL query including cube, rollup or grouping sets operations, generally, the group generator module 110 generates the groupings as the output 114. The output 114 of the group generator module 110 is fed to the SQL cube, rollup and grouping sets group by processing module 118, which performs a single group by operation on the output 114 answer a query. The answer to the query 104 is output at 120 by the query response module 122.
An example of a general SQL query including cube, rollup and grouping sets operations is as follows:
select g1, . . . , gm, agg(a1), . . . ,agg(ak)
from < >
where < >
group by OPR(g1, . . . , gm);
OPR(g1, . . . , gm) may be any of the following:
With regard to cube(g1, . . . , gm), the group generator module 110 is specified as follows. Cube(g1, . . . , gm) forms 2m groupings (i.e., all possible combinations over columns (g1, . . . , gm)). The group generator module 110 outputs at 114 (m+k+1) columns, as presented by the following SQL:
Let “foo” represent the input data stream (the result of processing the From and Where clauses (i.e., from < >, where < >) by the intermediate processing module 108 to the group generator module 110. Each “union all” branch below will generate one of the 2m combinations.
select null, . . . , null, . . . , null, . . . , a1, . . . ak, 0 as grouping_id from foo
union all
select g1, . . . , null, . . . , null, . . . , a1, . . . , ak, 1 from foo
union all
select null, . . . , g1, . . . , null, a1, . . . , ak, i from foo
union all select null, . . . , null, . . . , gm, a1, . . . , ak, m from foo;
union all
select g1, . . . , g1, . . . , gi, . . . , gm, . . . , a1, . . . , ak, 2m from foo
With regard to rollup(g1, . . . , gm), the group generator module 110 may be specified as follows. The group generator module 110 outputs (m+k+1) columns, as presented by the following SQL. Let “foo” represent the input data stream from the intermediate processing module 108 to the group generator module 110. Rollup(g1, . . . , gm) will form (m+1) groupings: ( ), (g1), (g1, . . . , g2), . . . , (g1,g2,g3), . . . , (g1,g2, . . . , gm). Each “union all” branch below will generate one of the above (m+1) groupings as follows:
select null, . . . , null, . . . , null, a1, . . . , ak, 0 as grouping_id from foo
union all
select g1, . . . , null, . . . , a1, . . . , ak, 1 from foo
union all
select g1, . . . , gi, . . . , null, . . . , a1, . . . , ak, i from foo
union all
select g1, . . . , gi, . . . , gm, a1, . . . , ak, m from foo
With regard to grouping sets(subset1(g1, . . . , gm), . . . , subsetn(g1, . . . , gm)), the group generator module 110 may be specified as follows. The group generator module 110 outputs (m+k+1) columns, as presented by the following SQL. Let “foo” represent the input data stream from the intermediate processing module 108 to the group generator module 110. Grouping sets(subset1(g1, . . . , gm), . . . , subsetn,(g1, . . . , gm) will form (n) groupings subset1(g1, . . . , gmn), . . . , subsetn,(gf, . . . , gm), Each “union all” branch below will generate one of the above (n) groupings.
select subset—1(g1, . . . , gm), a1, . . . , ak, 0 as grouping_id from foo
union all
select subset_i(g1, . . . , gm), a1, . . . , ak, i from foo
union all
select subset_n(g1, . . . , gm), a1, . . . , ak, n from foo
The SQL cube, rollup and grouping sets group by processing module 118, which receives the output 114 of the group generator module 110 is specified as follows. The group by operation of the SQL cube, rollup and grouping sets group by processing module 118 proceeds as follows:
Group by:
Grouping columns: (g1, . . . , gm,grouping_id)
Aggregates: (agg(a1), . . . , agg(ak))
Based on the foregoing discussion related to operation of the group generator module 110 for a SQL query including cube, rollup or grouping sets operations, generally, a SQL query including cube, rollup or grouping sets operations is processed by first processing the From and the Where clauses of the query (i.e., the where < > clause in the foregoing example of generalized SQL query including cube, rollup or grouping sets operations). The Where clause is processed by the intermediate processing module 108. The output of the intermediate processing module 108 is fed into the group generator module 110. The output 114 of the group generator module 110 is fed to the SQL cube, rollup and grouping sets group by processing module 118, which performs a single group by operation on the output 114 to answer a query. The answer to the query 104 is output at 120 by the query response module 122.
An example of an operation of the group generator module 110 for a SQL query including a cube operation is described. The SQL query including a cube operation relates to Table 2 (i.e., Table Orders) as described previously. For the SQL query including a cube operation, the query may specify:
select prodid, dealerid, sum(amount) as sum_amount, sum(quantity) as
sum_quantity
from orders
group by cube(prodid, dealerid)
The output of the SQL query including the cube operation is specified in Table 7:
In order to transform the foregoing example of the SQL query including the cube operation, the transformation shown below is applied by the modules 110 and 118:
The output of inner and outer query blocks in the foregoing example of a SQL query including the cube operation, as processed by the group generator module 110 and the SQL cube, rollup and grouping sets group by processing module 118, is shown below in Tables 8 and 9, respectively.
An example of the operation of the group generator module 110 for a SQL query including a rollup operation is described. The SQL query including a rollup operation relates to Table 2 (i.e., Table Orders) as described previously. For the SQL query including a rollup operation, the query may specify: select prodid, dealerid, sum(amount) as sum amount, sum(quantity) as
sum_quantity
from orders
group by rollup(prodid, dealerid)
The output of the SQL query including the rollup operation is specified in Table 10:
In order to transform the foregoing example of the SQL query including the rollup operation, the transformation shown below is applied by the modules 110 and 118:
The output of inner and outer query blocks in the foregoing example of a SQL query including the rollup operation, as processed by the group generator module 110 and the SQL cube, rollup and grouping sets group by processing module 118, is shown below in Tables 11 and 12, respectively.
An example of an operation of the group generator module 110 for a SQL query including a grouping sets operation is described. The SQL query including a grouping sets operation relates to Table 2 (Le., Table Orders) as described previously. For the SQL query including a grouping sets operation, the query may specify:
select prodid, dealerid, sum(amount) as sum amount, sum(quantity) as
sum_quantity
from orders
group by grouping sets(dealerid, (prodid, dealerid))
The output of the SQL query including the grouping sets operation is specified in Table 13:
In order to transform the foregoing example of the SQL query including the grouping sets operation, the transformation shown below is applied by the modules 110 and 118:
The output of inner and outer query blocks in the foregoing example of a SQL query including the grouping sets operation, as processed by the group generator module 110 and the SQL cube, rollup and grouping sets group by processing module 118, is shown below in Tables 14 and 15, respectively.
Referring to
At block 202, a determination is made as to whether the query is a SQL query including MDAs, or a SQL query including cube, rollup or grouping sets operations. For example, referring to
At block 203, based on the determination, the query is processed to generate an output. For example, referring to
At block 204, based on the query type, a predetermined number of maximum group by operations are performed on the output to generate a response to the query. For example, referring to
Referring to
At block 302, a determination is made as to whether the query is a SQL query including MDAs, or a SQL query including cube, rollup or grouping sets operations. For example, referring to
At block 303, the From and the Where clauses of the query are processed. For example, referring to
At block 304, for a SQL query including MDAs, a first output is generated. For example, referring to
At block 305, for a SQL query including cube, rollup or grouping sets operations, a second output is generated. For example, referring to
At block 306, for a SQL query including MDAs, inner and outer group by operations are generated and performed. For example, referring to
At block 307, for a SQL query including the cube operation, 2m groupings are generated, where m is a number of grouping columns for the cube operation. For example, cube(g1, . . . , gm) forms 2m groupings (i.e., all possible combinations over columns
At block 308, for a SQL query including the rollup operation,rn+1 groupings are generated, where m is a number of grouping columns for the rollup operation. For example, the group generator module 110 outputs m+1 groupings, each with (m+k+1) columns, for a SQL query including the rollup operation with m number of grouping columns and k number of aggregates.
At block 309, for a SQL query including the grouping sets operation, n groupings are generated, where n is a number of sets for the grouping sets operation. For example, the group generator module 110 outputs n groupings, each with (m+k+1) columns, for a SQL query including the grouping sets operation with n number of grouping sets, in number of grouping columns and k number of aggregates.
At block 310, for the SQL query including cube, rollup or grouping sets operations, a single group by operation is performed on the output of the group generator module to answer a query. For example, referring to
The computer system includes a processor 402 that may implement or execute machine readable instructions performing some or all of the methods, functions and other processes described herein. Commands and data from the processor 402 are communicated over a communication bus 404. The computer system also includes a main memory 408, such as a random access memory (RAM), where the machine readable instructions and data for the processor 402 may reside during runtime, and a secondary data storage 408, which may be non-volatile and stores machine readable instructions and data. The memory and data storage are examples of computer readable mediums. The memory 406 may include modules 420 including machine readable instructions residing in the memory 406 during runtime and executed by the processor 402. The modules 420 may include the modules 102, 106, 108, 110, 116, 118 and 122 of the apparatus shown in
The computer system may include an I/O device 410, such as a keyboard, a mouse, a display, etc. The computer system may include a network interface 412 for connecting to a network. Other known electronic components may be added or substituted in the computer system.
What has been described and illustrated herein is an example along with some of its variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the spirit and scope of the subject matter, which is intended to be defined by the following claims -- and their equivalents -- in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2012/037938 | 5/15/2012 | WO | 00 | 10/31/2014 |