A database is a collection of stored data that is logically related and that is accessible by one or more users or applications. A popular type of database is the relational database management system (RDBMS), which includes relational tables, also referred to as relations, made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information.
One of the goals of a database management system is to optimize the performance of queries for access and manipulation of data stored in the database. Given a target environment, an optimal query plan is selected, with the optimal query plan being the one with the lowest cost, e.g., response time, as determined by an optimizer. The response time is the amount of time it takes to complete the execution of a query on a given system.
Database applications, especially business intelligence tools, often produce large and complex SQL queries, some of which can be simplified to equivalent queries for more efficient processing. However, sometimes it is not feasible to request those application providers to produce more efficient queries because it would either require major changes in the applications or special treatment for certain cases.
Disclosed embodiments provide a system, method, and computer readable medium for optimizing query performance in a database system. In one embodiment, join predicates of a self outer join are evaluated. If each join predicate is respectively based on a common join attribute, and each join attribute has a not null constraint applied thereto, the self outer join may be re-written as a self inner join. In another embodiment, if not null and unique constraints are applied to each join attribute of an inner join featuring join predicates each respectively based on a common join attribute, the inner join may advantageously removed thereby resulting in a select operation.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures, in which:
It is to be understood that the following disclosure provides many different embodiments or examples for implementing different features of various embodiments. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting.
As shown, the database system 100 includes one or more processing nodes 1051 . . . Y that manage the storage and retrieval of data in data-storage facilities 1101 . . . Y. Each of the processing nodes may host one or more physical or virtual processing modules, such as one or more access module processors (AMPs). Each of the processing nodes 1051 . . . Y manages a portion of a database that is stored in a corresponding one of the data-storage facilities 1101 . . . Y. Each of the data-storage facilities 1101 . . . Y includes one or more disk drives or other storage medium.
The system stores data in one or more tables in the data-storage facilities 1101 . . . Y. The rows 1151 . . . Y of the tables are stored across multiple data-storage facilities 1101 . . . Y to ensure that the system workload is distributed evenly across the processing nodes 1051 . . . Y. A parsing engine 120 organizes the storage of data and the distribution of table rows 1151 . . . Y among the processing nodes 1051 . . . Y and accesses processing nodes 1051 . . . Y via an interconnect 130. The parsing engine 120 also coordinates the retrieval of data from the data-storage facilities 1101 . . . Y in response to queries received from a user, such as one using a client computer system 135 connected to the database system 100 through a network 125 connection. The parsing engine 120, on receiving an incoming database query, applies an optimizer 122 component to the query to assess the best plan for execution of the query. Selecting the optimal query-execution plan includes, among other things, identifying which of the processing nodes 1051 . . . Y are involved in executing the query and which database tables are involved in the query, as well as choosing which data-manipulation techniques will serve best in satisfying the conditions of the query. To this end, the parser and/or optimizer may access a data dictionary 124 that specifies the organization, contents, and conventions of one or more databases. For example, the data dictionary 124 may specify the names and descriptions of various tables maintained by the MPP system 150 as well as fields of each database. Further, the data dictionary 124 may specify the type, length, and/or other various characteristics of the stored tables. Further, the optimizer may utilize statistics for making query assessments during construction of the query-execution plan. For example, database statistics may be used by the optimizer to determine data demographics, such as attribute minimum and maximum values and data ranges of the database. The database system typically receives queries in a standard format, such as the Structured Query Language (SQL) put forth by the American National Standards Institute (ANSI). The parsing engine 120 and data dictionary 124 may be implemented as computer-executable instruction sets tangibly embodied on a computer-readable medium, such as a memory device 142, that are retrieved by a processing module 140 and processed thereby.
Although the system and configuration depicted in
In one example system, the parsing engine 120 is made up of three components: a session control 300, a parser 305, and a dispatcher 310 as shown in
Consider an exemplary query for providing available credit for account holders:
Self outer join operations similar to the above may often be produced by a database application or tool. In accordance with disclosed embodiments, a self outer join may be advantageously re-written to a self inner join in particular scenarios. Left outer joins require mechanisms that track rows from the left table that do not match a row from the right table. Right outer joins and full outer joins require similar resources. Such mechanisms consume memory space and processing resources. Thus, re-writing a self outer join as a self inner join provides an improvement in the query processing.
Consider a self left outer join, for example, performed on a table R on join predicates (R1.J11=R2.J11 . . . and . . . R1.J1n=R2.J1n) where both R1 and R2 are aliases for table R, and J11, . . . J1n are attributes from R. In this instance, each join predicate respectively involves a common join attribute, e.g., the join predicate R1.J11=R2.J11 involves a common join attribute J11. Assume there are “NOT NULL” constraints on the join attributes J11-J1n. Because for any row in R there is at least one matching row (itself) from R, the left outer join can be rewritten to an inner join. For the example query, assume there are “NOT NULL” constraints on the join attributes Account_Nbr and SS_Num. Accordingly, the self left outer join may then be rewritten to a self inner join according to the following:
This query rewriting mechanism is applicable to self right outer joins and self full outer joins as well.
The conversion routine is invoked (step 602), and a self outer join is received (step 604). An evaluation may then be made to determine if not null constraints are associated with each join attribute (step 606). For example, the data dictionary 124 may be interrogated to evaluate the join attributes to determine if each join attribute of the self outer join has a not null constraint applied thereto. In the event that a not null constraint is not placed on each join attribute, the self outer join may then be executed (step 608), and the conversion routine cycle may then end (step 616).
Returning again to step 606, in the event that a not null constraint is placed on each join attribute, an evaluation may then be made to determine if each join predicate is respectively based on a common attribute (step 610). If each join predicate is not based on a respective common attribute, the self outer join may then be executed according to step 608.
If each join predicate is respectively based on a common attribute, the self outer join may then be rewritten as a self inner join (step 612), and the self inner join may then be executed (step 614). Alternatively, the self inner join may be evaluated to determine if the self inner join may be removed in accordance with an embodiment as described more fully hereinbelow with reference to
In accordance with embodiments, a self inner join may advantageously be eliminated in particular scenarios. Consider a self inner join on a table R according to join predicates (R1.J11=R2.J11 . . . and . . . R1.J1n=R2.J1n) where both R1 and R2 are aliases for R, and J11, . . . J1n are attributes from R. In this instance, each join predicate respectively involves a common join attribute, e.g., the join predicate R1.J11=R2.J11 involves a common join attribute J11. Assume there are unique constraints on the join attributes J11-J1n. Thus, for any row in R there is at most one matching row (itself) from R. Accordingly, the inner join may be eliminated. For the example query, assume there are unique constraints on the join attributes Account_Nbr and SS_Num. Therefore, the self inner join can be removed. However, since self inner-joining removes rows with NULL values in the join attributes, a WHERE condition may be added to the query to remove rows with NULL values in the join attributes after eliminating the self inner join. After properly renaming attributes in the select clause if necessary, the following simplified query may be obtained in accordance with disclosed embodiments:
If, however, the inner join evaluation is performed subsequent to the outer join evaluation described above with reference to
The conversion routine is invoked (step 702), and a self inner join is received (step 704). An evaluation is then made to determine if each join predicate is respectively based on a common attribute (step 706). If each join predicate is not based on a respective common attribute, the self inner join may then be executed according to step 708, and the join conversion routine cycle may then end (step 722).
Returning again to step 706, if each join predicate is respectively based on a common attribute, an evaluation may then be made to determine if there is a unique constraint placed on each join attribute (step 710), e.g., by interrogating the data dictionary 124. If there is not a unique constraint placed on each join attribute, the self inner join may then be executed according to step 708.
Returning again to step 710, if it is determined that there is a unique constraint placed on each join attribute, an evaluation may be made to determine if a not null constraint is associated with each join attribute (step 712). If a not null constraint is not applied to each join attribute, the self inner join may be eliminated (step 714) thereby resulting in a select operation, and a not null constraint may be added to the select statement on the join attributes of the original inner join (step 716). The resulting select operation may then be executed (step 720). If it is determined that a not null constraint is applied to each join attribute at step 712, the self inner join may then be eliminated (step 718), and the resulting select operation may then be executed according to step 720. The conversion routine cycle may then end according to step 722.
As described, mechanisms for optimizing query performance in a database system are provided. In one embodiment, join predicates of a self outer join are evaluated. If each join predicate is respectively based on a common join attribute, and each join attribute has a not null constraint applied thereto, the self outer join may be re-written as a self inner join advantageously resulting in less consumption of system resources and general query performance optimization. In another embodiment, if not null and unique constraints are applied to each join attribute of an inner join featuring join predicates each respectively based on a common join attribute, the inner join may advantageously removed thereby resulting in a select operation.
The flowcharts of
The illustrative block diagrams and flowcharts depict process steps or blocks that may represent modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or steps in the process. Although the particular examples illustrate specific process steps or procedures, many alternative implementations are possible and may be made by simple design choice. Some process steps may be executed in different order from the specific description herein based on, for example, considerations of function, purpose, conformance to standard, legacy structure, user interface design, and the like.
Aspects of the disclosed embodiments may be implemented in software, hardware, firmware, or a combination thereof. The various elements of the system, either individually or in combination, may be implemented as a computer program product tangibly embodied in a machine-readable storage device for execution by a processing unit. Various steps of embodiments may be performed by a computer processor executing a program tangibly embodied on a computer-readable medium to perform functions by operating on input and generating output. The computer-readable medium may be, for example, a memory, a transportable medium such as a compact disk, a floppy disk, or a diskette, such that a computer program embodying aspects of the disclosed embodiments can be loaded onto a computer. The computer program is not limited to any particular embodiment, and may, for example, be implemented in an operating system, application program, foreground or background process, or any combination thereof, executing on a single processor or multiple processors. Additionally, various steps of embodiments may provide one or more data structures generated, produced, received, or otherwise implemented on a computer-readable medium, such as a memory.
Although disclosed embodiments have been illustrated in the accompanying drawings and described in the foregoing description, it will be understood that embodiments are not limited to the disclosed examples, but are capable of numerous rearrangements, modifications, and substitutions without departing from the disclosed embodiments as set forth and defined by the following claims. For example, the capabilities of the disclosed embodiments can be performed fully and/or partially by one or more of the blocks, modules, processors or memories. Also, these capabilities may be performed in the current manner or in a distributed manner and on, or via, any device able to provide and/or receive information. Still further, although depicted in a particular manner, a greater or lesser number of modules and connections can be utilized with the present disclosure in order to accomplish embodiments, to provide additional known features to present embodiments, and/or to make disclosed embodiments more efficient. Also, the information sent between various modules can be sent between the modules via at least one of a data network, an Internet Protocol network, a wireless source, and a wired source and via a plurality of protocols.