The present invention relates generally to processing of database statements with join predicates on range-partitioned tables.
With a range-partitioned table, if a database statement includes a predicate on a partitioning column, then it may be possible to exclude one or more partitions of the table from processing of the database statement, which would save time and resources. Elimination of partitions from database statement processing when join predicates are involved, however, has been limited. In particular, a partition of a range-partitioned table can only be eliminated from processing of a database statement with a join predicate on a partitioning column of the range-partitioned table when a nested-loop join operation is selected for optimization of the database statement and the range-partitioned table is designated as the inner table of the nested-loop join operation. Hence, if another type of join operation (e.g., a sort-merge join operation, a hash join operation, or the like) is selected for optimization of the database statement or if the range-partitioned table is designated as the outer table of the nested-loop join operation, then partition elimination would not be possible.
A method for processing database statements with join predicates on range-partitioned tables is provided. The method provides for receiving a database statement for processing, the database statement comprising a join predicate between a first table and a second table, the first table being a range-partitioned table that is partitioned based on values in a partition key of the first table, the partition key of the first table being one or more columns of the first table, responsive to the join predicate involving one or more leading columns of the partition key for the first table, determining whether a nested-loop join operation, a sort-merge join operation, or a hash join operations has been selected to join the first table and the second table, responsive to a nested-loop join operation being selected to join the first table and the second table, determining whether the first table is an outer table or an inner table of the nested-loop join operation, responsive to the first table being the outer table of the nested-loop join operation, determining whether data from the second table is piped, responsive to data from the second table being piped, calculating a maximum value and a minimum value for a join key of the second table, the join key of the second table being one or more columns of the second table involved in the join predicate, creating a partition elimination bitmap for the first table based on the maximum value and the minimum value calculated for the join key of the second table, the partition elimination bitmap including a bit for each partition in the first table, a value of each bit in the partition elimination bitmap denoting whether the partition in the first table corresponding to the bit is to be scanned or not, and pushing the partition elimination bitmap created for the first table onto a scan of the first table to exclude one or more partitions of the first table from the scan.
The present invention generally relates to processing of database statements with join predicates on range-partitioned tables. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. The present invention is not intended to be limited to the implementations shown, but is to be accorded the widest scope consistent with the principles and features described herein.
Range-partitioned tables are tables that are partitioned based on values in one or more columns. The term partitioning key is sometimes used to reference the one or more columns on which partitioning of a range-partitioned table is based. Each partition of a range-partitioned table includes rows in which values of the partitioning key for the range-partitioned table are within a pre-defined range.
For range-partitioned tables, database statement predicates that are on a partitioning column may be used to exclude certain partition(s) of the range-partitioned tables from database statement processing. This will lead to savings in time and resources. As an example, suppose a table ‘dp’ is created as follows:
Based on the table definition, table ‘dp’ includes five partitions. Suppose a database statement set forth below is received for processing:
The database statement includes two predicates, “a>50” and “a<150”. Based on these two predicates, only partitions “p1” and “p2” will need to be scanned because the other partitions “p3”, “p4”, and “p5” will not contain any data that will satisfy the two predicates in the database statement. As a result, partitions “p3”, “p4”, and “p5” can be excluded from processing of the database statement. The process of excluding partitions from consideration in database statement processing based on predicates is sometimes referred to as partition elimination or data partition elimination.
With join predicates, however, partition elimination has been limited. Currently, when a database statement includes a join predicate on a partitioning key of a range-partitioned table, partition elimination is only available if a nested-loop join operation is selected for the join and the range-partitioned table is designated as an inner table of the nested-loop join operation because the join predicate can be pushed on (e.g., applied to) the inner table access.
For example, suppose another database statement set forth below is received for processing:
The database statement includes a join predicate “dp.a=t.a”. Also, suppose that column ‘a’ of table ‘t’ only contains values in the range of 50 to 250. Thus, only partitions “p1”, “p2”, and “p3” of table ‘dp’ will need to be scanned as the other two partitions “p1” and “p2” will not contain any data that will satisfy the join predicate.
If a database system processing the database statement chooses to join tables ‘dp’ and ‘t’ using a nested-loop join operation with table ‘dp’ as an inner table of the nested-loop join operation, then data partition elimination can be performed by pushing the join predicate onto the scan of table ‘dp’. Thus, as each row of the outer table is retrieved, all rows of the inner table need not be scanned if the join predicate is pushed onto the scan of the inner table.
However, if the database system chooses to join tables ‘dp’ and ‘t’ using a nested-loop join operation with table ‘dp’ as an outer table of the nested-loop join operation, then a full scan of table ‘dp’ will be required. Likewise, if the database system optimizer chooses to join tables ‘dp’ and ‘t’ using another type of join operation, such as a sort-merge join operation, a hash join operation, and so forth, then the join predicate cannot be pushed onto the scans of table ‘dp’ and as a result, a full scan of table ‘dp’ will be required.
A list of terminology used throughout the following description is provided below.
Depicted in
At 14, a determination is made as to whether the join predicate involves one or more leading columns of the partition key for the first table. As an example, suppose the partition key for the first table is columns A, B, and C. If the join predicate involves column A of the first table, columns A and B of the first table, or columns A, B, and C of the first table, then the join predicate involves one or more leading columns of the partition key for the first table.
If it is determined at 14 that the join predicate does not involve one or more leading columns of the partition key for the first table (e.g., join predicate only involves columns B and/or C), then process 10 ends at 22. Otherwise, a determination is made at 16 as to whether a nested-loop join operation has been selected to join the first table and the second table. When a nested-loop join operation has been selected to join the first table and the second table, process 10 proceeds to process block 24 in
At 24 in
However, if the first table is the outer table of the nested-loop join operation, then a determination is made at 28 as to whether data from the second table is being piped. When data from the second table is not being piped (i.e., data from the second table is being dammed), a partition elimination bitmap is created for the first table at 30 while constructing a dam for the second table. At 32, the partition elimination bitmap created for the first table is pushed onto a scan of the first table to exclude one or more partitions of the first table from the scan.
In one implementation, the second table is a derived table (e.g., a table derived from other operations, such as one or more filtering operations, other join operations, and so forth). Hence, the partition elimination bitmap may not be created for the first table until a last dam before the nested-loop join operation is being constructed. Specifically, more than one dam may be constructed depending on the type and number of operations leading to the derivation of the second table.
Referring back to
For example, suppose a first table ‘RP’ is partitioned on columns ‘a’ and ‘b’ and includes the following partitions:
In addition, suppose a second table ‘TT’ with columns ‘a’ and ‘b’ includes the following values:
Also, suppose a database statement received for processing includes the following two join predicates:
Given the above assumptions, the join key for the second table ‘TT’ is (a, b), the minimum value for the join key of the second table ‘TT’ is (1, 1), and the maximum values for the join key of the second table ‘TT’ is (3, 1000). A partition elimination bitmap of [00111111] can then be created for the first table ‘RP’ based on the maximum and minimum join key values of the second table ‘TT’. Partitions 0 and 1 of the first table ‘RP’ will not be scanned because both the starting and ending values of those partitions are below the minimum join key value of the second table ‘TT’. As a result, none of the data in partitions 0 and 1 of the first table ‘RP’ will satisfy the two join predicates.
In one implementation, process 10 includes an additional process block (not shown) between process blocks 28 and 34 in which a determination is made as to whether a cost associated with calculating the maximum and minimum join key values of the second table will be less than a cost associated with scanning one or more partitions of the first table. For instance, the cost associated with calculating the maximum and minimum join key values of the second table may be more if an index has not been created on the join key of the second table.
If the cost associated with calculating the maximum and minimum values for the join key of the second table will be less than the cost associated with scanning one or more partitions of the first table, then process 10 proceeds to process block 34. Otherwise, process 10 ends at 22 because it will be inefficient to go through process blocks 34 through 38 when it will be less costly to simply scan all of the partitions of the first table.
Referring back to
With a hash join operation, a hash table is constructed for a table that is designated as an inner table of the hash join operation. When constructing the hash table, the inner table is scanned and rows of the inner table are copied into memory buffers. The memory buffers are then divided into sections based on hash values computed on a join key of the inner table. If the size of the inner table exceeds the memory available for buffering, buffers from selected sections may be written to temporary tables.
After the inner table has been processed, the table designated as an outer table of the hash join operation is scanned and rows from the outer table are matched to rows from the inner table by comparing hash values computed on join keys of the inner and outer tables. If the hash value on the join key of a row from the outer table matches the hash value on the join key of a row from the inner table, then the actual join key value of the row from the outer table is compared to the actual join key value of the row from the inner table.
Rows from the outer table that correspond to portions of the inner table not written to a temporary table are match immediately with rows from the inner table that are in memory. If the portion of the inner table corresponding to a row from the outer table was written to a temporary table, then the row from the outer table is also written to a temporary table. Subsequently, matching pairs of table portions from temporary tables are read, hash values are matched, and join predicates are checked.
In
On the other hand, when the first table is the inner table of the hash join operation, a maximum value and a minimum value of the join key for the second table are calculated at 48. The maximum and minimum values may be calculated using an index on the join key of the second table. A partition elimination bitmap is created for the first table based on the maximum and minimum join key values calculated for the second table at 50. The partition elimination bitmap created for the first table is then pushed onto a scan of the first table to exclude one or more partitions of the first table from the scan at 52.
In one implementation, process 10 includes an additional process block (not shown) between process blocks 46 and 48 in which a determination is made as to whether a cost associated with calculating the maximum and minimum join key values of the second table will be less than a cost associated with scanning one or more partitions of the first table. For instance, the cost associated with calculating the maximum and minimum join key values of the second table may be more if an index has not been created on the join key of the second table.
If the cost associated with calculating the maximum and minimum values for the join key of the second table will be less than the cost associated with scanning one or more partitions of the first table, then process 10 proceeds to process block 48. Otherwise, process 10 ends at 22 because it will be inefficient to go through process blocks 48 through 52 when it will be less costly to simply scan all of the partitions of the first table.
Referring back to
Referring back to
Graphical representations of different types of sort-merge join operations are illustrated in
In
At 58 in
A second row is retrieved from the second table at 68. At 70, a second partition elimination bitmap is created for the first table based on a value of a join key in the second row retrieved from the second table. The second partition elimination bitmap is then pushed onto a second scan of the first table at 72 to exclude one or more partitions of the first table from the second scan.
In one implementation, creation of the partition elimination bitmaps for the first table comprises preparing a sorted partition identifiers list for the first table. In the sorted partition identifiers list, identifiers for partitions of the first table are sorted based on values of the partition key for the first table. For example, suppose data from both the inner table and the outer table of a sort-merge operation are being piped. In addition, suppose the inner table is a range-partitioned table that includes 3 partitions identified as P0, P1, and P2, which are partitioned based on values in one column of the inner table. The one column is both a partition key and a join key for the inner table. Also, suppose the partition key value ranges for the 3 partitions of the inner table are defined as follows:
The sorted partition identifiers list for the inner table would then be {P1, P0, P2}. Once the sorted partition identifiers list for the first table has been prepared, an initial partition elimination bitmap is created for the first table with all partitions to be scanned. Hence, in the example above, the initial partition elimination bitmap for the inner table would be [111], which represents partitions P0, P1, and P2 in non-sorted order.
Based on the join key value of a particular row fetched from the second table, one or more partitions in the first table that include the particular join key value are identified. The sorted partition identifiers list for the first table can then be used to identify any partition in the first table preceding the one or more partitions that include the join key value of the particular row. The initial partition elimination bitmap can then be updated so that the preceding partitions identified will not be scanned.
In the example above, suppose a row retrieved from the outer table of the sort-merge join operation has a join key value of 151. Partition P0 in the inner table will then be identified as including the join key value of the row retrieved from the outer table. Based on the sorted partition identifiers list for the inner table, partition P1 precedes partition P0. As a result, the initial partition elimination bitmap for the inner table will be updated to [101] to reflect that partition P1 of the inner table need not be accessed.
Rows that are retrieved from the second table after the very first row is retrieved need not use the initial partition elimination bitmap since rows in the second table will be retrieved in sorted order. Thus, creating a partition elimination bitmap for a row retrieved from the second table may only involve updating a partition elimination bitmap created for a row previously retrieved from the second table, or may simply involve utilizing the partition elimination bitmap created for the row previously retrieved from the second table.
For instance, in the example above, suppose a next row retrieved from the outer table of the sort-merge join operation has a join key value of 199. The partition elimination bitmap [101] created for the inner table based on the row previously retrieved from the outer table can be reused since partition P0 in the inner table also includes the join key value of the next row retrieved from the outer table.
Suppose, in the example above, a third row retrieved from the outer table of the sort-merge join operation has a join key value of 450. Partition P2 will be identified as including the join key value of the third row retrieved from the outer table. Based on the sorted partition identifiers list for the inner table, partitions P1 and P0 both precede partition P2. As a result, the partition elimination bitmap [101] created for the inner table based on the second row retrieved will be updated to [001] to reflect that partitions P0 and P1 of the inner table need not be accessed.
Although
If it is determined at 60 in
If it is determined at 58 in
On the other hand, if it is determined at 80 that data from the first table is not being piped (i.e., data from the first table is being dammed), then a partition elimination bitmap is created for the first table at 86 while a dam is constructed for the second table. In one implementation, the second table is a derived table and the partition elimination bitmap is not created for the first table until a last dam before the sort-merge join operation is being constructed. At 88, the partition elimination bitmap created for the first table is pushed past an operation that dams data from the first table and onto a scan of the first table to exclude one or more partitions of the first table from being scanned.
The techniques discussed above in
Through aspects of the present invention, data partition elimination is now possible even when a join predicate on a partitioning key of a range-partitioned table cannot be pushed down onto the scan of the range-partitioned table. Hence, even if a sort-merge join operation or a hash join operation is selected for optimization of a database statement that includes a join predicate on a range-partitioned table or a nested-loop join operation with the range-partitioned table as the outer table is selected for optimization of the database statement, partitions of the range-partitioned table that will not satisfy the join predicate can be eliminated from processing.
The invention can take the form of an entirely hardware implementation, an entirely software implementation, or an implementation containing both hardware and software elements. In one aspect, the invention is implemented in software, which includes, but is not limited to, application software, firmware, resident software, microcode, etc.
Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk. Current examples of optical disks include DVD, compact disk-read-only memory (CD-ROM), and compact disk-read/write (CD-R/W).
Memory elements 404a-b can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times the code must be retrieved from bulk storage during execution. As shown, input/output or I/O devices 408a-b (including, but not limited to, keyboards, displays, pointing devices, etc.) are coupled to data processing system 400. I/O devices 408a-b may be coupled to data processing system 400 directly or indirectly through intervening I/O controllers (not shown).
In the implementation, a network adapter 410 is coupled to data processing system 400 to enable data processing system 400 to become coupled to other data processing systems or remote printers or storage devices through communication link 412. Communication link 412 can be a private or public network. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.
While various implementations for processing of database statements with join predicates on range-partitioned tables have been described, the technical scope of the present invention is not limited thereto. For example, the present invention is described in terms of particular systems having certain components and particular methods having certain steps in a certain order. One of ordinary skill in the art, however, will readily recognize that the methods described herein can, for instance, include additional steps and/or be in a different order, and that the systems described herein can, for instance, include additional or substitute components. Hence, various modifications or improvements can be added to the above implementations and those modifications or improvements fall within the technical scope of the present invention.