This invention is related to the field of electronic database management.
The generation of optimal execution plans is critical to the performance of applications. For example, a single SQL statement with very poor performance can bring an application down to its knees. Sometimes a poorly performing SQL statement is due to user error, such as a blind query issued with without filtering conditions that would have reduced the amount of data processed. Other times the SQL statement is well formed, but the associated execution plan that is generated by the optimizer is suboptimal.
The suboptimal plan results in a run-away execution of the query. In other words, the plan, when executed, causes a SQL statement to run for a long time with enormous use of system resources. The problem of fixing the execution plan is usually addressed through a manual SQL tuning process. This process involves a tuning expert analyzing the SQL statement as well as its associated execution plan, then determining that the problem lies in the execution plan and not in the way the SQL statement is used (for example, an accidental use of a Cartesian join by not joining one of the tables to any of the other tables in the query). The manual SQL analysis process is a time-consuming task.
After this analysis, the expert performs a manual SQL tuning process to influence the optimizer to generate a good plan. This involves the tuning expert adding one or more tuning actions to the statement. These actions may be to identify and collect missing statistics and refresh stale statistics, change the value of some configuration parameter which directly affects the plan generation methodology of the optimizer, add one or more hints to the SQL statement which will give the directives to the optimizer in coming up with the right plan, create a new access path (such as an index) or modify an existing one to help avoid large scans of data. The manual SQL tuning process is also a time-consuming and complex task.
Many vendors have addressed the problem of run-away query execution by using a query governor control mechanism. The query governor can be either reactive or proactive. In a reactive mode, an execution-time threshold is set to abort any query whose cumulative execution time exceeds to threshold. In a proactive mode, an optimized-estimated-time threshold is set which is applied to the time optimizer has estimated for the query to run. Any query having an estimated run-time that exceeds the threshold is never run. With either of these methods, there is no attempt made to look at the root cause of the problem.
Some vendors have used the idea of setting execution-time thresholds at various places in the execution plan to detect a case of run-away query execution. When a threshold is crossed during query execution, the run is aborted and the query sent back to the optimizer for re-optimization. But this method suffers from two drawbacks: setting of the thresholds and monitoring them at runtime incurs overhead, which can be significant and undesirable especially for light-weight queries, and the method of aborting a run and re-optimizing a query can be quite disruptive, especially if the run is aborted right before it was about to complete.
A run-away query execution is automatically identified by a background process that periodically looks at each of the currently executing queries and compares the current execution time with the execution time estimated by the optimizer. Each query execution having a negative execution time difference can be automatically identified as a run-away query execution. The query execution plans that result in run-away executions can then be automatically tuned to produce more efficient execution plans.
Overview
The embodiments of the invention are described using the term “SQL”, however, the invention is not limited to just this exact database query language, and indeed may be used in conjunction with other database query languages and constructs.
The automatic performance monitoring of query executions identifies run-away query executions, then performs a re-optimization for the corresponding execution plans in a background process. The automatic prevention of run-away query executions may abort a current execution of a query run if the automatic process has produced an improved plan in the background, and further, has determined a benefit to aborting the current execution and performing an execution of the new plan.
This process can be implemented by an automatic SQL tuning optimizer and a SQL tuning base. In one embodiment, the run-away query execution is identified by a background process that periodically looks at each of the currently executing queries and compares the time spent in executing it so far (current-time) vs. the time the optimizer has estimated the execution to take (estimate-time). The top N queries with the largest negative difference (estimate-time−current-time) may be selected as run-away executions. An alternate method of identifying run-away query executions can be based on the current-time, that is, the process can select the top N queries with the longest current execution time as run-away query executions.
The automatic tuning optimizer (ATO), in a background process, then optimizes the execution plan for each query having a run-away execution by performing various analyses of the corresponding SQL statement, such as automatic identification and correction of inaccurate statistics, cardinality estimates, and cost estimates related to the statement, for example. If the execution plan built by the ATO is different from the one that is currently executing, the ATO can estimate how much more time the current plan execution is going to take to complete (remaining-time), as well as estimate how much time the new plan will take to execute (new-time). If the new-time is less than the remaining-time then the current plan run may be aborted and replaced with the new plan.
Since the ATO uses validated estimates of the cost, selectivity and cardinality, it can compute the total execution time of the new plan much more accurately. Similarly, it can regenerate the original run-away plan that is currently executing with validated estimates to compute its remaining execution time. Because the identification of run-away query executions, and the automatic generation of improved plans for the corresponding queries are performed by the ATO in the background, this automatic process is transparent to the database user.
Automatic Identification and Tuning of Run-Away Execution Plans
The automatic prevention of run-away query executions is performed by a process as shown in
Then, a new execution plan, along with a time estimate for executing the new plan, can be generated using the profile. Also, a revised estimate of the execution time of the run-away execution plan is generated using the profile, 150. If the new plan can be executed faster than the currently executing run-away plan, then the current plan is identified as a run-away plan. A second comparison of execution times is performed to determine whether to abort the current execution of the run-away plan and executing the new plan, or to allow the run-away plan to run to completion, 160. If the remaining execution time of the run-away plan is less than the execution time of the new plan, then the current plan is allowed to finish. If the execution time of the new plan is less than the remaining execution time of the currently executing run-away plan, then the run-away plan is aborted and the new plan is executed.
Automatic Prevention Architecture
An example of a system 200 for automatic prevention of run-away queries is shown in
SQL Profiling
A profiling process is performed by the automatic tuning optimizer to produce a set of tuning actions in generating an execution plan for a SQL statement. The profiling process verifies that statistics are not missing or stale, validates the estimates made by the query optimizer for intermediate results, and determines the correct optimizer settings. Tuning actions are created based on the results of the profiling process, to provide missing statistics for an object, validate intermediate results estimate, and select the best setting for optimizer parameters. Then, the Automatic Tuning Optimizer builds a SQL Profile for these tuning actions.
The statistics analysis verifies that statistics are not missing or stale. The query optimizer logs the types of statistics that are actually used during the plan generation process, in preparation for the verification process. For example, when a SQL statement contains an equality predicate, it logs the column number of distinct values, whereas for a range predicate it logs the minimum and maximum column values information. Once the logging of used statistics is complete, the query optimizer checks if each of these statistics is available on the associated query object (i.e. table, index or materialized view). If the statistic is available then it verifies whether the statistic is up-to-date. To verify the accuracy of a statistic, it samples data from the corresponding query object and compares it to the statistic. If a statistic is found to be missing, the query optimizer will generate auxiliary information to supply the missing statistic. If a statistic is available but stale, it will generate auxiliary information to compensate for staleness.
One feature of a cost-based query optimizer is its ability to derive the size of intermediate results. For example, the optimizer estimates the number of rows from applying table filters when deciding which join algorithm to pick. One factor that causes the optimizer to generate a sub-optimal plan is wrong estimate of intermediate result sizes. Wrong estimates can be caused by a combination of the following factors: The predicate (filter or join) is too complex to use standard statistical methods to derive the number of rows (e.g., the columns are compared thru a complex expression like (a*b)/c=10), The data distribution of the column used in the predicate is skewed, and there is no histogram, leading the optimizer to assume a uniform data distribution, or The data in column values is correlated but the optimizer is not aware of it, causing the optimizer to assume data independence. During SQL Profiling, the Automatic Tuning Optimizer validates the estimates made by the query optimizer, and compensates for missing information or wrong estimates. The validation process may involve running part of the query on a sample of the input data.
The Automatic Tuning Optimizer uses the past execution history of a SQL statement to determine the correct optimizer settings. For example, if the execution history shows that a SQL statement is only partially executed in the majority of times then the appropriate setting will be to optimize it for first n rows, where n is derived from the execution history. This constitutes a customized parameter setting for the SQL statement. (Note that past execution statistics are available in the Automatic Workload Repository (AWR) presented later).
The tuning information produced from the statistics, estimates, and settings analyses is stored in a SQL Profile. Once a SQL Profile is created, it is used in conjunction with the existing statistics by the compiler to produce a well-tuned plan for the corresponding SQL statement.
The automatic prevention of run-away queries can identify a plan that is a potential run-away plan. The process analyzes the SQL statement for the plan to determine if the potential run-away plan is caused by a bad plan. For example, the process can create a profile for the statement, use the profile to generate a new plan, and compare the new plan to the old plan to determine if the old plan is a run-away plan. The process can also use the profile to determine whether the run-away plan is close to finishing, and therefore should run to completion, or if the run-away plan should be aborted and the new plan should be executed in its place. Thus, the automatic prevention of run-away query executions eliminates the overhead incurred by conventional methods, such as monitoring of thresholds and aborting a run just before it finishes.
According to one embodiment of the invention, computer system 400 performs specific operations by processor 404 executing one or more sequences of one or more instructions contained in system memory 406. Such instructions may be read into system memory 406 from another computer readable medium, such as static storage device 408 or disk drive 410. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention.
The term “computer readable medium” as used herein refers to any medium that participates in providing instructions to processor 404 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as disk drive 410. Volatile media includes dynamic memory, such as system memory 406. Transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise bus 402. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
Common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, carrier wave, or any other medium from which a computer can read.
In an embodiment of the invention, execution of the sequences of instructions to practice the invention is performed by a single computer system 400. According to other embodiments of the invention, two or more computer systems 400 coupled by communication link 420 (e.g., LAN, PTSN, or wireless network) may perform the sequence of instructions to practice the invention in coordination with one another. Computer system 400 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 420 and communication interface 412. Received program code may be executed by processor 404 as it is received, and/or stored in disk drive 410, or other non-volatile storage for later execution.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.
This application claims the benefit of U.S. Provisional Application No. 60/500,490, filed Sep. 6, 2003, which is incorporated herein by reference in its entirety. This application is related to co-pending applications “SQL TUNING SETS,” Attorney Docket No. OI7036272001; “AUTO-TUNING SQL STATEMENTS,” Attorney Docket No. OI7037042001; “SQL PROFILE,” Attorney Docket No. OI7037052001; “GLOBAL HINTS,” Attorney Docket No. OI7037062001; “SQL TUNING BASE,” Attorney Docket No. OI7037072001; “AUTOMATIC LEARNING OPTIMIZER,” Attorney Docket No. OI7037082001; “METHOD FOR INDEX TUNING OF A SQL STATEMENT, AND INDEX MERGING FOR A MULTI-STATEMENT SQL WORKLOAD, USING A COST-BASED RELATIONAL QUERY OPTIMIZER,” Attorney Docket No. OI7037102001; “SQL STRUCTURE ANALYZER,” Attorney Docket No. OI7037112001; “HIGH-LOAD SQL DRIVEN STATISTICS COLLECTION,” Attorney Docket No. OI7037122001; “AUTOMATIC SQL TUNING ADVISOR,” Attorney Docket No. OI7037132001, all of which are filed Sep. 7, 2004 and are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60500490 | Sep 2003 | US |