Users can request access to data by submitting a query to a database system, such as an Enterprise Data Warehouse (EDW). Queries can be executed in a number of manners, and some may be more efficient than others. An optimizer can utilize a variety of statistics to analyze a users' query and/or in optimizing possible query plans.
Organizations may utilize Enterprise Data Warehouses (EDWs), e.g., data repositories from a number of disparate sources, to store information and make it accessible within the organization. In some examples, an EDW can include a number of EDW clients, e.g., organizations, and/or functions within an organization. Each of the number of EDW clients can share resources, e.g., CPU resources, relational tables, and/or indexes, among other resources, within a particular EDW. Similarly, each of the EDW clients can have a workload which is managed by the EDW. These workloads can have varying service levels, e.g., service level agreements, that must be met. As used herein, a service level refers to the admission and execution rules for a particular service. For example, a service level can specify a particular throughput speed, e.g., 5 seconds, for processing of a query.
A server, e.g., database, in an EDW can receive queries from one or more client devices, e.g., computers, which are connected to the EDW. As used herein, a query can refer to a request to search for and/or retrieve data that is stored in a server connected to a network. A query can include a Structured Query Language (SQL) query, e.g., a query based on a computer language that is designed for the retrieval and management of data in relational database management systems. Further, a query plan can refer to an ordered series of steps performed to execute a query, e.g., access data in an SQL relational database management system. In some examples, a query plan can be presented as a query plan tree, e.g., a graphical representation displaying parallel instances of steps being performed to execute the query.
When a client, e.g., a client of the EDW, submits a particular query, an SQL compiler can produce a query plan to execute the particular query. An SQL compiler can be a process that compiles SQL statements, e.g., instructions received from a client to execute a query, into query plans. Additionally, an SQL optimizer can collaborate with the SQL compiler to generate an optimal query plan, e.g., the most efficient and/or fastest running query plan available from a plurality of possible query plans. An SQL optimizer can be a process that determines which of a plurality of possible query plans would be the most efficient, e.g., identify an optimal query plan.
Many times, an SQL optimizer identifies optimal query plans. However, periodically the SQL optimizer may identify a suboptimal query plan. As used herein, a suboptimal query plan refers to a query plan that is not the most efficient and/or fastest running query plan available from a plurality of possible query plans. A suboptimal query plan generated by a SQL compiler could take hours to complete and therefore disrupt a pre-determined service level agreement (SLA). The reasons for producing a suboptimal query plan could be inadequate statistics, cardinality estimation errors, premature exhaustion of search space due to reaching the CPU/memory limit, and/or defects in the SQL compiler.
In contrast, examples of the present disclosure allow for detection and repair of suboptimal query plans prior to execution, e.g., deployment. By detecting and repairing suboptimal query plans prior to execution, corrective steps can be implemented early enough to prevent disruption to SLAs. Further, detecting and repairing suboptimal query plans prior to execution in accordance with examples of the present disclosure can save time and expense for information technology (IT) professionals and can ensure optimal management and performance of workloads within an EDW.
The plurality of engines, e.g., 103, 104, as used herein can include a combination of hardware and software, e.g., program instructions, but at least includes hardware, that is configured to perform particular functions, tasks and/or actions. For example, the engines shown in
For example, the analysis engine 103 can include hardware and/or a combination of hardware and program instructions to perform a post optimization plan analysis to identify suboptimal query plans. A query plan and/or a plurality of query plans, produced by an SQL compiler and/or SQL optimizer can be input to the analysis engine 103 for post optimization plan analysis. As discussed previously, a query can be, for example, a request to search and/or retrieve data that is stored in a server, and a query plan can be a configuration for a number of operators, e.g., hardware and/or a combination of hardware and program instructions that is configured to perform a particular function, to be executed in an EDW to respond to a query. For instance, a join operator can include hardware and/or a combination of hardware and program instructions that is configured to perform a join function, e.g., identifying and joining all tuples satisfying a number user-specified join conditions. As used herein, a tuple can refer to a piece of data. For example, a data stream including a stream of events from traffic sensors and a data stream containing information from a number of weather monitor sensors, can be joined based on location, e.g., the common attribute between the two streams, to predict the probability of accidents due to bad weather conditions in real time. Other examples of operators can include a scan operator, e.g., hardware and/or a combination of hardware and program instructions to scan all tuples and search for a particular condition, a sort operator, e.g., hardware and/or a combination of hardware and program instructions to sort tuples into either ascending or descending order, and a groupby operator, e.g., hardware and/or a combination of hardware and program instructions to group all tuples based on user-specified criteria, among other operators.
The analysis engine 103 can identify suboptimal query plans from the query plans input from the SQL compiler and/or SQL optimizer. As used herein, an optimal query plan can be a query plan that includes a lowest cost, e.g., an amount of network and/or memory resource utilization, the amount of data to be processed, the amount of data to be read from the disks, and/or an amount of time needed for execution, for executing the query associated with the query plan. A suboptimal query plan can be a query plan that does not have the lowest cost, e.g., is not the most efficient and/or fastest running query plan, among a plurality of possible query plans that could be executed.
In some examples, a user and/or administrator of the EDW can specify patterns that can be used to identify suboptimal query plans. For instance, for each operator in a query plan, a user can specify conditions, e.g., name of a table and/or column, the degree of parallelism (DoP), input or output size (rows), the unique entry count (UEC), the number of file opens (OPENS), and/or the number of predicates attached to a column, and values associated with those conditions that are associated with a suboptimal query plan. As discussed further in connection with
The reoptimization engine 104 can include hardware and/or a combination of hardware and program instructions to reoptimize an identified suboptimal query plan before execution. Once an SQL compiler and/or SQL optimizer generates a list of optimized query plans, the analysis engine 103 can identify suboptimal query plans from among the optimized query plans generated. In some examples, the reoptimization engine 104 can be executed by the SQL compiler itself. The reoptimization engine 104 can apply directives, e.g., corrective actions, identified in a config file to the particular operator associated with the suboptimal query plan. As described more in connection with
The embodiments are not limited to the example engines shown in
The memory resource 211 can be a non-transitory machine readable medium, including one or more memory components capable of storing instructions that can be executed by processing resource 209 and may be integrated in a single device or distributed across multiple devices. Further, memory resource 211 may be fully or partially integrated in the same device as processing resource 209 or it may be separate but accessible to that device and processing resource 209. Thus, it is noted that the computing device 208 may be implemented on a participant device, on a server device, on a collection of server devices, and/or a combination of a participant, e.g., user, device and one or more server devices as part of a distributed computing environment, cloud computing environment, etc.
The memory resource 211 can be in communication with the processing resource 209 via a communication link, e.g., a path, 210. The communication link 210 can provide a wired and/or wireless connection between the processing resource 209 and the memory resource 211.
In the example of
Each of the plurality of modules 213, 214 can include instructions that when executed by the processing resource 209 can function as a corresponding engine as described in connection with
In some examples, the analysis module 213 can input to a post optimization analyzer, an optimized query plan generated by an SQL optimizer. Further, in some examples, the analysis module 213 can perform a post optimization plan analysis to identify a suboptimal query plan. For instance, the analysis module 213 can search a hash table storing information defining an operator, a condition, and a corrective action associated with a suboptimal query plan, in order to identify a suboptimal query plan. The analysis module 213 can search optimized query plans received from the SQL optimizer starting with a root of the optimized query plan, and ending with a leaf of the optimized query plan. As discussed further herein, content of the hash table can be read from a configuration file within an SQL compiler.
In some examples, the reoptimization module 214 can apply corrective actions to reoptimize the identified suboptimal query plan prior to execution of the identified suboptimal query plan. Further, the reoptimization module 214 can reconfigure an SQL optimizer with corrective actions identified from the hash table, reoptimize the suboptimal query plan a number of times, and send the reoptimized query plan to an executor in a database system for execution, such as in an EDW.
Embodiments are not limited to the example modules shown in
As illustrated in
In a number of examples, optimized query plans generated by the SQL optimizer 330 can be sent to the post optimization analyzer 326 and analyzed to identify suboptimal query plans based on user-specified conditions. For example, users of the EDW system can define in a production environment and store in a configuration file, e.g., config file 325, what constitutes suboptimal query plans.
Users, customers, or database administrators (hereinafter referred to as “users”) can learn behavior of query plans produced by the SQL optimizer 330 through knowledge discovery over the course of product usage. Using this knowledge, the users can gain insight about risk patterns, and based on the type of risk, can define conditions for operators which can lead to suboptimal plans. A user can specify each suboptimal query plan in a tuple form, such as: {<Operator>, <Condition>, <Action>}. That is, each tuple identifying a suboptimal query plan can define an operator, a condition associated with the operator, and an action to be taken if such conditions are satisfied. The <Operator> portion of the tuple can specify an operator, e.g., a relational operator, such as a scan operator, a join operator, or a sort operator, that can be found in a query plan tree. In some examples, the <Operator> portion of the tuple can further be defined by a suffix of LC, e.g., left-child, or RC, e.g., right-child, to indicate the left or right child of the operator, as represented in the query plan tree. The <Condition> portion of the tuple can describe elements of the suboptimal query plan in detail through a Boolean expression of attributes and thresholds. An attribute, as used herein, can be the name of a table or a column, the degree of parallelism (DoP) of the operator, input or output size in number of rows, a unique entry count (UEC), a number of file opens (OPENS), or a number pf predicates attached to a column. A threshold for an attribute, as used herein, can be an empirical value specific to the attribute and beyond which the particular operator performs in what the user would define as a suboptimal manner. For example, a serial plan that processes an excessive number of rows in one process can be expressed in tuple form as:
{root, DoP=1 and InputRows>1000000, <action>}
That is, the root operator, can have a condition including two attributes and associated values: DoP equal to 1, and input size greater than 1000000 rows. If the condition is satisfied for the root operator, e.g., the DoP is equal to 1 and the input size is greater than 1000000 rows, then a particular action can be taken.
The <action> portion of the tuple can specify a corrective action for the SQL compiler 322 to take, in response to identifying that the conditions for a particular operator are satisfied. Corrective actions can include applying existing SQL optimizer control switches, and/or overriding the cost comparison during pruning, e.g., as performed by the SQL optimizer 330. For example, a corrective action can include an instruction to the SQL optimizer 330 to avoid identifying a query plan meeting the conditions specified in the suboptimal query plan as an optimal query plan, despite the outcome of a cost comparison against other possible query plans. Some other examples of corrective actions can include a PARALLEL instruction, e.g., an instruction to produce a parallel plan regardless of cost, for the relational operator; a SORT instruction, e.g., an instruction to sort the output rows for the relational operator; and/or an NO-MDAM SCAN instruction, e.g., an instruction to perform a regular scan for particular relational operator.
As illustrated in
As illustrated in
While
At 441, the method 440 can include receiving an optimized query plan input from a SQL optimizer. For example, the SQL optimizer (330 illustrated in
At 442, the method 440 can include performing post optimization plan analysis to identify operators associated with a suboptimal query plan. For example, a post optimization analyzer, e.g., a post optimization plan analysis and correction phase of a SQL compiler, can compare elements in optimized query plans received from the SQL optimizer against patterns defined by a user and stored in a configuration file in the SQL compiler. A pattern, as used herein, can refer to a user defined pattern that identifies an operator and a condition associated with that operator that can be used to identify an element of a suboptimal query plan. An element of a suboptimal query plan can be identified when all of the conditions associated with that pattern are met. Some examples of suboptimal query plan patterns, as well as the conditions associated with them are as follows:
The post optimization plan analysis and correction phase can be executed from root to leaf of an optimized query plan. That is, the post optimization plan analysis and correction phase is executed for every operator included in a query plan, beginning with the last operator to be executed in the query plan and ending with the first operator to be executed in the query plan (e.g., the optimized plan tree is traversed from root to leaf).
At 443, the method 440 can include comparing details associated with an operator identified as being potentially associated with a suboptimal query plan, to conditions defining the suboptimal query plan. For example, as discussed in connection with
At 444, the method 440 can include applying directives to reoptimize the optimized query plan in the optimizer before execution in an EDW, in response to determining that all conditions defining a suboptimal query plan are satisfied. For example, as discussed in relation to
While
In the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how a number of examples of the disclosure can be practiced. These examples are described in sufficient detail to enable those of ordinary skill in the art to practice the examples of this disclosure, and it is to be understood that other examples can be used and that process, electrical, and/or structural changes can be made without departing from the scope of the present disclosure.
The figures herein follow a numbering convention in which the first digit corresponds to the drawing figure number and the remaining digits identify an element or component in the drawing. Elements shown in the various figures herein can be added, exchanged, and/or eliminated so as to provide a number of additional examples of the present disclosure. In addition, the proportion and the relative scale of the elements provided in the figures are intended to illustrate the examples of the present disclosure, and should not be taken in a limiting sense.
As used herein, “logic” is an alternative or additional processing resource to perform a particular action and/or function, etc., described herein, which includes hardware, e.g., various forms of transistor logic, application specific integrated circuits (ASICs), etc., as opposed to computer executable instructions, e.g., software firmware, etc., stored in memory and executable by a processor. Further, as used herein, “a” or “a number of” something can refer to one or more such things. For example, “a number of widgets” can refer to one or more widgets. Also, as used herein, “a plurality of” something can refer to more than one of such things.
The above specification, examples and data provide a description of the method and applications, and use of the system and method of the present disclosure. Since many examples can be made without departing from the spirit and scope of the system and method of the present disclosure, this specification merely sets forth some of the many possible embodiment configurations and implementations.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2014/035908 | 4/29/2014 | WO | 00 |