This invention is related to the field of database management.
Currently, database administrators (DBAS) and application developers spend a large amount of time trying to tune poorly performing and resource intensive SQL statements (which is commonly referred to as bad sql). However, it is often a very challenging task. First, it requires a high level of expertise in several complex areas, such as query optimization and SQL design. Second, it is a time consuming process because each statement is unique and needs to be tuned individually. Third, it requires an intimate knowledge of the database (i.e., view definitions, indexes, table sizes, etc.) as well as the application (e.g. process flow, system load). Finally, the SQL tuning activity is a continuous task because the SQL workload and the database are always changing. As a result, tuning is often done on a trial and error basis, resulting in loss of productivity.
Often a SQL statement can be a high load SQL statement simply because it is badly written. This usually happens when there are different, but not semantically equivalent, ways to write a statement to produce same result. Knowing which of these alternate forms is most efficient in producing the query result is a difficult and daunting task for application developers since it requires both a deep knowledge about the properties of data they are querying as well as a very good understanding of the semantics and performance of SQL constructs.
To help DBAs and application developers overcome these challenges, several software companies have developed diagnostics tools that help identify SQL performance issues and suggest actions to fix them. However, these tools are not integrated with the database compiler, which is the system component that is most responsible for SQL performance. Indeed, these tools interpret the optimization information outside of the database to perform the tuning, so their tuning results are less robust and limited in scope. Moreover, they cannot directly tackle the internal challenges faced in producing an optimal execution plan.
The SQL Structure Analyzer component of the Automatic Tuning Optimizer performs what-if analysis to recognize missed query rewrite opportunities and makes SQL restructuring recommendations for the user to undertake.
A method to address structural performance problems of a database query language statement is described. The method includes receiving a database query language statement at an optimizer, evaluating choices in a search space to generate an execution plan for the statement, and producing annotations to record one or more reasons for selecting or rejecting each choice while generating the execution plan. The method can further include the examination of the annotations associated with the costly operators in the chosen plan and producing appropriate SQL restructuring recommendations to improve the execution performance of the statement.
Overview
The embodiments of the invention are described using the term “SQL”, however, the invention is not limited to just this exact database query language, and indeed may be used in conjunction with other database query languages and constructs.
A SQL Structure Analyzer is a component of an Automatic Tuning Optimizer that addresses structural performance problems of a statement. This component can be used by programmers, during an application development, to detect poorly written SQL statements, and apply alternative ways of rewriting them to improve the performance. The analyzer can determine whether a SQL statement is a high load statement simply because it is badly written. For example, different, but not necessarily semantically equivalent, ways to write a statement to produce same result can be examined to determine which of these alternate forms is most efficient. Although this is a difficult and daunting task for application developers, since it requires both a deep knowledge about the properties of data they are querying as well as a very good understanding of the semantics and performance of SQL constructs, the optimizer can perform the structural analysis process in an efficient manner.
Performing the structural analysis of a statement within and by the automatic tuning optimizer itself, while generating an execution plan for the statement, allows the procedure to identify and gather information about the statement's structure that will help produce an efficient plan. The method can use this information to compare different, but not necessarily equivalent, ways of writing a statement to produce the same result.
The query optimizer can perform extensive query transformations while preserving the semantics of the original query. Some of the transformations are based on heuristics (i.e. internal rules), but many others are based on cost-based selection. Examples of query transformations include subquery unnesting, materialized view (MV) rewrite, simple and complex view merging, rewrite of grouping sets into UNIONs, and other types of transformations.
The query optimizer may apply a transformation when the query can be rewritten into a semantically equivalent form. Semantic equivalence can be established when certain conditions are met; for example, a particular column in a table has the non-null property. However, these conditions may not exist in the database but can be enforced by the application. The SQL Structure Analyzer performs what-if analysis to recognize missed query rewrite opportunities and makes recommendations for the user to undertake.
When a rewrite is not possible, the optimizer generates diagnostic information in the form of internal annotations to remember the reasons why the particular rewrite was not possible. The annotations can include necessary conditions that were not met, as well as various choices that were available during the plan generation process. After a best plan is generated, the optimizer examines the annotations, and produces appropriate recommendations for improving the execution plan. For example, the recommendations can be suggestions on how to rewrite the statement, as well as suggestions for changing the schema, in order to improve the performance of the statement. In addition, the optimizer can use the an notations to produce rationale and informative messages about potential improvements that can be made to the statement, in order to educate application developers who code the SQL statement.
One possible output from the SQL Structure Analyzer can be a rewritten SQL text that the user can accept as an alternative form of the original statement. If the user accepts the alternate form then he has to pass the rewritten SQL text as input to the query optimizer in place of original SQL text.
The SQL structural analysis is a cost-based process, wherein it considers the annotations associated with costly operators in the annotated execution plan. As a result, the process generates recommendations for costly nodes and operators that, when reconsidered by changing the structure of the query statement, will significantly improve the performance of the execution plan. For example, a costly node can be defined as a node having an individual cost that is greater than a threshold, such as 10% of the total plan cost. The recommendation for the costly operator is then mapped to the corresponding node in the plan tree, as well as to the operator in the SQL statement.
An example of a device that includes the SQL Structure Analyzer is shown in
The annotations can include alternatives that were considered and rejected. An alternative structure can be rejected because it may cause a change in the query results. An alternative may also be rejected for other reasons as well. For example, when the optimizer explores the possibility of merging a view, it runs its tests to determine if it is logically possible to merge a view. If this is not possible, the analyzer records the reason for not being able to merge the view in the execution plan. If the optimizer can merge the view, but it decides not to merge it, then the analyzer can record the reason for not choosing to merge the view.
The annotated plan is then examined by the optimizer to generate recommendations for rewriting the statement, as well as recommendations on schema changes to improve the performance of the SQL statement. For example, after gathering information about the structure of the statement, the optimizer can identify an expensive operator in the statement. Using the annotations for the expensive node, the optimizer can access a knowledge base or a rule base to retrieve a rule for replacing the expensive operator in the statement with a less expensive operator.
If the expensive operator is, for example, a UNION operator, the optimizer can find a rule in the knowledge base for this operator, such as “replace UNION with UNION ALL.” The optimizer can determine if applying this rule to the query will reduce the cost of the operator. If so, then the optimizer can recommend that the user rewrite this operator of the statement by replacing the UNION operator with the UNION ALL operator. However, with this particular rewrite, the query results may be different, because the UNION ALL operator will not remove duplicates from the results, but the UNION operator will. Thus, the recommendation will include an improved performance benefit resulting from this rewrite, as well as the potential for different query result. If the user decides that the improved performance is worth the trade-off in the results, the user can apply this recommendation to the SQL statement.
In addition to applying rules stored in a knowledge base, the optimizer can accept rules from the user and apply them while considering the annotations. Also, the user can disable certain rules in the knowledge base to prevent the query optimizer from giving recommendations that cannot be implemented by the user.
There are various causes of poor performance, which are related to the structure of a SQL statement, that can be identified and overcome by using the structural analysis process. These causes can be syntax-based, semantics-based, or design issues.
An example of a semantic-based factor that can be analyzed to improve performance is a UNION operator in a SQL statement. The replacement of the UNION operator with the semantically different UNION ALL operator may provide an equivalent result if duplicate rows are not in the result. For example, if the UNION-ALL operator is used for tables that have different data, such as ‘last year's sales’ and ‘this year's sales,’ the UNION-ALL operator in this example can provide the same result as the UNION, because the result of the operation has no duplicate rows, making the duplicate elimination performed by the UNION operator redundant. Thus, an analysis of the structure provides a basis to recommend replacing UNION with UNION-ALL, thus eliminating an expensive duplicate elimination procedure from the execution plan.
Another example is the use of the semantic-based NOT IN subquery. When this semantic-based construct is replaced by a corresponding but not semantically equivalent NOT EXISTS subquery, the result can be a significant performance boost. This replacement can be recommended by the analysis process if NULL values are not present in the related join columns, thus ensuring that same result is produced by either of these operators. Another example is
Syntax-based constructs are generally used to specify predicates in a SQL statement. The corresponding performance attributes of syntax-based constructs are therefore related to the specification of predicates in the SQL statement. For example, if a predicate such as col=:bnd is used with col and :bnd having different types, then such a predicate is unable to be used as an index driver. Similarly, a predicate involving a function or expression (e.g. func(col)=:bnd, col1+col2=:bnd) on an indexed column prevents the query optimizer from using an index as an access path. As a result, this predicate, which involves this function may not be used as an index driver unless there is a functional index on the function itself. Therefore, rewriting the statement by simplifying the complex predicate can enable index access paths leading to a better execution plan.
Design issues related to performance include an accidental use of a Cartesian product, for example, which occurs when one of the tables is not joined to any of the other tables in a SQL statement. This problem is frequent when the query involves a large number of tables. Therefore, rationale and informative messages can be produced to educate programmers who code SQL statements about potential design improvements to the statements.
For example, during the development stage, developers are generally focused on writing SQL statements that produce a desired result, rather than designing the statement to optimize the performance of the statement. The informative messages can help the developers improve performance by identifying design mistakes and offering alternatives. For example, the structural analysis method can identify a mistake that causes a SQL statement to perform poorly, such as a type mismatch between the column and its predicate value, which essentially disables the use of an index even if one is available, then inform the user of the mistake in the design of the statement.
An example of a SQL structure analysis method that can be performed by the automatic tuning optimizer to detect poor SQL constructs falling into one or more categories listed above is shown in
The automatic tuning optimizer generates internal annotations and diagnostic information, 210 and associates them to the execution plan operators. The annotations are produced while the automatic tuning optimizer is evaluating the various choices during the process of building the execution plan. Each annotation can be quite extensive and can include the reasons for making a decision as well as the alternatives that were considered and the corresponding reasons for rejecting them. For example, when the automatic tuning optimizer explores the possibility of merging a view, it will check necessary conditions to see if it is logically possible to merge the view. If not possible, it can record the reason for not merging the view. If it can merge but it decides not to merge the view, it can record the reason for not doing so.
After the optimal execution plan has been built, the automatic tuning optimizer examines the costly operators in the annotated execution plan. For example, a costly operator can be defined as one whose individual cost is more than 10% of the total plan cost. The automatic tuning optimizer examines the annotations associated with each of the costly operators and produces appropriate recommendations, 220. The Automatic Tuning Optimizer also provides rationale behind each of its recommendations. For example, a rationale can provide an explanation for using a recommended SQL construct in place of the original one to improve the cost and hence the performance of the corresponding execution plan.
The SQL structure recommendations allow a developer or the optimizer to rewrite a problematic SQL statement. Therefore, the SQL structure analysis method can be used to improve SQL statements while they are being developed, before they are deployed into a production system or a packaged application. Another important benefit of the SQL structure recommendations is that they can help educate the developers in writing well-formed SQL statements.
According to one embodiment of the invention, computer system 300 performs specific operations by processor 304 executing one or more sequences of one or more instructions contained in system memory 306. Such instructions may be read into system memory 306 from another computer readable medium, such as static storage device 308 or disk drive 310. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention.
The term “computer readable medium” as used herein refers to any medium that participates in providing instructions to processor 304 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as disk drive 310. Volatile media includes dynamic memory, such as system memory 306. Transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise bus 302. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
Common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, carrier wave, or any other medium from which a computer can read.
In an embodiment of the invention, execution of the sequences of instructions to practice the invention is performed by a single computer system 300. According to other embodiments of the invention, two or more computer systems 300 coupled by communication link 320 (e.g., LAN, PTSN, or wireless network) may perform the sequence of instructions to practice the invention in coordination with one another. Computer system 300 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 320 and communication interface 312. Received program code may be executed by processor 304 as it is received, and/or stored in disk drive 310, or other non-volatile storage for later execution.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.
This application claims the benefit of U.S. Provisional Application No. 60/500,490, filed Sep. 6, 2003, which is incorporated herein by reference in its entirety. This application is related to applications “SQL TUNING SETS,” with U.S. application Ser. No. 10/936,449, now published as U.S. Publication No. 2005/0125393; “SQL PROFILE,” with U.S. application Ser. No. 10/936,205, now published as U.S. Publication No. 2005/0125452; “AUTO-TUNING SQL STATEMENTS,” with U.S. application Ser. No. 10/935,908, now published as U.S. Publication No. 2005/0120000; “GLOBAL HINTS,” with U.S. application Ser. No. 10/936,781, now published as U.S. Publication No. 2005/0125398; “SQL TUNING BASE,” with U.S. application Ser. No. 10/936,468, now published as U.S. Publication No. 2005/0097091; “AUTOMATIC LEARNING OPTIMIZER,” with U.S. application Ser. No. 10/935,906, now published as U.S. Publication No. 2005/0119999; “AUTOMATIC PREVENTION OF RUN-AWAY QUERY EXECUTION,” with U.S. application Ser. No. 10/936,779, now published as U.S. Publication No. 2005/0177557; “METHOD FOR INDEX TUNING OF A SQL STATEMENT, AND INDEX MERGING FOR A MULTI-STATEMENT SQL WORKLOAD, USING A COST-BASED RELATIONAL QUERY OPTIMIZER,” with U.S. application Ser. No. 10/936,469, now published as U.S. Publication No. 2005/0187917; “HIGH LOAD SQL DRIVEN STATISTICS COLLECTION,” with U.S. application Ser. No. 10/936,427, now published as U.S. Publication No. 2005/0138015; “AUTOMATIC SQL TUNING ADVISOR,” with U.S. application Ser. No. 10/936,778, now published as U.S. Publication No. 2005/0125427, all of which are filed Sep. 7, 2004 and are incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
5140685 | Sipple et al. | Aug 1992 | A |
5260697 | Barrett et al. | Nov 1993 | A |
5398183 | Elliott | Mar 1995 | A |
5408653 | Josten et al. | Apr 1995 | A |
5481712 | Silver et al. | Jan 1996 | A |
5504917 | Austin | Apr 1996 | A |
5544355 | Chaudhuri et al. | Aug 1996 | A |
5577240 | Demers et al. | Nov 1996 | A |
5634134 | Kumai et al. | May 1997 | A |
5724569 | Andres | Mar 1998 | A |
5737601 | Jain et al. | Apr 1998 | A |
5761660 | Josten et al. | Jun 1998 | A |
5765159 | Srinivasan | Jun 1998 | A |
5781912 | Demers et al. | Jul 1998 | A |
5794227 | Brown | Aug 1998 | A |
5794229 | French et al. | Aug 1998 | A |
5806076 | Ngai et al. | Sep 1998 | A |
5860069 | Wright | Jan 1999 | A |
5870760 | Demers et al. | Feb 1999 | A |
5870761 | Demers et al. | Feb 1999 | A |
5940826 | Heideman et al. | Aug 1999 | A |
5963933 | Cheng et al. | Oct 1999 | A |
5963934 | Cochrane et al. | Oct 1999 | A |
5991765 | Vethe | Nov 1999 | A |
6052694 | Bromberg | Apr 2000 | A |
6122640 | Pereira | Sep 2000 | A |
6195653 | Bleizeffer et al. | Feb 2001 | B1 |
6212514 | Eberhard et al. | Apr 2001 | B1 |
6275818 | Subramanian et al. | Aug 2001 | B1 |
6321218 | Guay et al. | Nov 2001 | B1 |
6330552 | Farrar et al. | Dec 2001 | B1 |
6349310 | Klein et al. | Feb 2002 | B1 |
6353818 | Carino, Jr. | Mar 2002 | B1 |
6356889 | Lohman et al. | Mar 2002 | B1 |
6366901 | Ellis | Apr 2002 | B1 |
6366903 | Agrawal et al. | Apr 2002 | B1 |
6374257 | Guay et al. | Apr 2002 | B1 |
6397207 | Bleizeffer et al. | May 2002 | B1 |
6397227 | Klein et al. | May 2002 | B1 |
6434545 | MacLeod et al. | Aug 2002 | B1 |
6434568 | Bowman-Amuah | Aug 2002 | B1 |
6442748 | Bowman-Amuah | Aug 2002 | B1 |
6460027 | Cochrane et al. | Oct 2002 | B1 |
6460043 | Tabbara et al. | Oct 2002 | B1 |
6493701 | Ponnekanti | Dec 2002 | B2 |
6496850 | Bowman-Amuah | Dec 2002 | B1 |
6513029 | Agrawal et al. | Jan 2003 | B1 |
6529901 | Chaudhuri et al. | Mar 2003 | B1 |
6560606 | Young | May 2003 | B1 |
6571233 | Beavin et al. | May 2003 | B2 |
6594653 | Colby et al. | Jul 2003 | B2 |
6598038 | Guay et al. | Jul 2003 | B1 |
6615223 | Shih et al. | Sep 2003 | B1 |
6701345 | Carley et al. | Mar 2004 | B1 |
6714943 | Ganesh et al. | Mar 2004 | B1 |
6721724 | Galindo-Legaria et al. | Apr 2004 | B1 |
6728719 | Ganesh et al. | Apr 2004 | B1 |
6728720 | Lenzie | Apr 2004 | B1 |
6744449 | MacLeod et al. | Jun 2004 | B2 |
6763353 | Li et al. | Jul 2004 | B2 |
6804672 | Klein et al. | Oct 2004 | B1 |
6816874 | Cotner et al. | Nov 2004 | B1 |
6839713 | Shi et al. | Jan 2005 | B1 |
6850925 | Chaudhuri et al. | Feb 2005 | B2 |
6865567 | Oommen et al. | Mar 2005 | B1 |
6910109 | Holman et al. | Jun 2005 | B2 |
6912547 | Chaudhuri et al. | Jun 2005 | B2 |
6915290 | Bestgen et al. | Jul 2005 | B2 |
6931389 | Bleizeffer et al. | Aug 2005 | B1 |
6934701 | Hall, Jr. | Aug 2005 | B1 |
6947927 | Chaudhuri et al. | Sep 2005 | B2 |
6961931 | Fischer | Nov 2005 | B2 |
6999958 | Carlson et al. | Feb 2006 | B2 |
7007013 | Davis et al. | Feb 2006 | B2 |
7031958 | Santosuosso | Apr 2006 | B2 |
7047231 | Grasshoff et al. | May 2006 | B2 |
7058622 | Tedesco | Jun 2006 | B1 |
7080062 | Leung et al. | Jul 2006 | B1 |
7139749 | Bossman et al. | Nov 2006 | B2 |
7146363 | Waas et al. | Dec 2006 | B2 |
7155426 | Al-Azzawe | Dec 2006 | B2 |
7155459 | Chaudhuri et al. | Dec 2006 | B2 |
7272589 | Guay et al. | Sep 2007 | B1 |
7302422 | Bossman et al. | Nov 2007 | B2 |
7353219 | Markl et al. | Apr 2008 | B2 |
20020073086 | Thompson et al. | Jun 2002 | A1 |
20020120617 | Yoshiyama et al. | Aug 2002 | A1 |
20020198867 | Lohman et al. | Dec 2002 | A1 |
20030018618 | Bestgen et al. | Jan 2003 | A1 |
20030065648 | Driesch et al. | Apr 2003 | A1 |
20030088541 | Zilio et al. | May 2003 | A1 |
20030093408 | Brown et al. | May 2003 | A1 |
20030110153 | Shee | Jun 2003 | A1 |
20030115183 | Abdo et al. | Jun 2003 | A1 |
20030126143 | Roussopoulos et al. | Jul 2003 | A1 |
20030130985 | Driesen et al. | Jul 2003 | A1 |
20030135478 | Marshall et al. | Jul 2003 | A1 |
20030154216 | Arnold et al. | Aug 2003 | A1 |
20030177137 | MacLeod et al. | Sep 2003 | A1 |
20030182276 | Bossman et al. | Sep 2003 | A1 |
20030187831 | Bestgen et al. | Oct 2003 | A1 |
20030200204 | Limoges et al. | Oct 2003 | A1 |
20030200537 | Barsness et al. | Oct 2003 | A1 |
20030229621 | Carlson et al. | Dec 2003 | A1 |
20030229639 | Carlson et al. | Dec 2003 | A1 |
20040002957 | Chaudhuri et al. | Jan 2004 | A1 |
20040003004 | Chaudhuri et al. | Jan 2004 | A1 |
20040019587 | Fuh et al. | Jan 2004 | A1 |
20040034643 | Bonner et al. | Feb 2004 | A1 |
20040181521 | Simmen et al. | Sep 2004 | A1 |
20040210563 | Zait et al. | Oct 2004 | A1 |
20040215626 | Colossi et al. | Oct 2004 | A1 |
20050033734 | Chess et al. | Feb 2005 | A1 |
20050097078 | Lohman et al. | May 2005 | A1 |
20050097091 | Ramacher et al. | May 2005 | A1 |
20050102305 | Chaudhuri et al. | May 2005 | A1 |
20050119999 | Zait et al. | Jun 2005 | A1 |
20050120000 | Ziauddin et al. | Jun 2005 | A1 |
20050120001 | Yagoub et al. | Jun 2005 | A1 |
20050125393 | Yagoub et al. | Jun 2005 | A1 |
20050125398 | Das et al. | Jun 2005 | A1 |
20050125427 | Dageville et al. | Jun 2005 | A1 |
20050125452 | Ziauddin et al. | Jun 2005 | A1 |
20050138015 | Dageville et al. | Jun 2005 | A1 |
20050177557 | Ziauddin et al. | Aug 2005 | A1 |
20050187917 | Lawande et al. | Aug 2005 | A1 |
20050251523 | Rajamani et al. | Nov 2005 | A1 |
20060004828 | Rajamani et al. | Jan 2006 | A1 |
20060167883 | Boukobza | Jul 2006 | A1 |
20070038618 | Kosciusko et al. | Feb 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
20050120001 A1 | Jun 2005 | US |
Number | Date | Country | |
---|---|---|---|
60500490 | Sep 2003 | US |