1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for continual generation of index advice.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. The most basic requirements levied upon computer systems, however, remain little changed. A computer system's job is to access, manipulate, and store information. Computer system designers are constantly striving to improve the way in which a computer system can deal with information.
Information stored on a computer system is often organized in a structure called a database. A database is a grouping of related structures called ‘tables,’ which in turn are organized in rows of individual data elements. The rows are often referred to as ‘records,’ and the individual data elements are referred to as ‘fields’ or ‘columns.’ In this specification generally, therefore, an aggregation of fields is referred to as a ‘data structure’ or a ‘record,’ and an aggregation of records is referred to as a ‘table.’ An aggregation of related tables is called a ‘database.’
A computer system typically operates according to computer program instructions in computer programs. A computer program that supports access to information in a database is typically called a database management system or a ‘DBMS.’ A DBMS is responsible for helping other computer programs access, manipulate, and save information in a database.
A DBMS typically supports access and management tools to aid users, developers, and other programs in accessing information in a database. One such tool is the structured query language, ‘SQL.’ SQL is query language for requesting information from a database. Although there is a standard of the American National Standards Institute (‘ANSI’) for SQL, as a practical matter, most versions of SQL tend to include many extensions. Here is an example of a database query expressed in SQL:
This SQL query accesses information in a database by selecting records from two tables of the database, one table named ‘stores’ and another table named ‘transactions.’ The records selected are those having value “Minnesota” in their store location fields and transactions for the stores in Minnesota. In retrieving the data for this SQL query, an SQL engine will first retrieve records from the stores table and then retrieve records from the transaction table. Records that satisfy the query requirements then are merged in a ‘join.’
Enterprise application environments support huge database-oriented business applications that present a substantial performance demands to a DBMS. Such large applications include, for example, enterprise resource planning (‘ERP’) applications, customer relations management (‘CRM’) applications, and supply chain management (‘SCM’) applications. These giant application environments all have one thing in common, they need to operate efficiently or a business will suffer financially. Tuning these large, complex applications can be a daunting task. The options users have are limited. A useful way to leverage database performance is to create indexes—because performance-enhancing modifications to applications often are practically impossible to make or, if possible, cost prohibitive.
An index is a set of pointers to rows of a database table, that is, a list of locations of rows in a table sorted by the contents of one or more specified columns. Each index is based on the values of data in one or more table columns. An index is an object that is separate from the data in the table indexed by the index. When a user requests a DBMS to create an index, the DBMS builds this structure and maintains it automatically.
Indexes may be used to speed up access to a table. Indexes also can serve a logical data design purpose. A unique index, for example, allows no entry of duplicate values in columns of a table, thereby guaranteeing that no two rows of a table are exactly the same. Indexes can also be created to specify ascending or descending order of the values in a column.
An ‘index key’ is the column or columns on which an index is defined. The structure of the key of an index affects the usefulness of the index. The order of the columns in the key has no effect on index creation, but the order can affect how the index is used.
A DBMS typically includes software tools or utility programs for database performance analysis. Such tools or utilities often include a database monitor program or index advisor capable of recommending or ‘advising’ indexes as a means of helping a database administrator tune a database for performance. The advice is given back to the user in different forms. On IBM's DB2 UDB for iSeries, for example, the information can be seen by collecting and interrogating Database Monitor output with the STRDBMON command. The industry approach requires advanced knowledge of the specific platform and database. This makes tuning the database an action limited to an expert and even then, can be a tedious, time consuming task.
Methods, apparatus, and computer program products are disclosed for continual generation of index advice that include generating an index advice for an index of a table in a computer database and recording values of statistical attributes of the index advice accumulated across repeated generations of the index advice. The generating and recording typically are carried out continually without user intervention. Embodiments include recording values of attributes of the index advice that specify an index. Typical embodiments also include, continually and without user intervention, recording values of attributes of the index advice that characterize usefulness of an index and recording values of attributes of the index advice that characterize cost of creating an index.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts of exemplary embodiments of the invention.
Exemplary methods, apparatus, and products for continual generation of index advice according to embodiments of the present invention are described with reference to the accompanying drawings, beginning with
In this example, job execution engine (104) will pass the operating system commands from job (102) to an operating system for execution and pass the SQL queries from job (102) to SQL module (112) for execution. Job execution engine (104) passes the SQL queries to SQL module (112) through application programming interface (‘API’) (107) of database management system (‘DBMS’) (105). DBMS (105) provides database management functions for database (114). DBMS (1064 exposes API (107) to enable applications, including, for example, job execution engine (104) to access functions of the DBMS, including, for example, SQL module (112). The ‘SQL’ command illustrated in job (102) is a function made available through API (107).
The system of
Database statistics (116) are typically implemented as metadata of a table, such as, for example, metadata of tables of database (114). Database statistics may include, for example:
These three database statistics are presented for explanation only, not for limitation. The use of any database statistics as will occur to those of skill in the art is well within the scope of the present invention.
Optimizer (106) uses database statistics (116) from database (114) for optimizing SQL queries against database (114). Optimizer (106) may notify statistics engine (108) when the optimizer attempts to use databases statistics for a column of a table, for example, and finds the database statistics missing or stale. Statistics engine (108) generates the missing or stale statistics.
Optimizer (106) is improved according to embodiments of the present invention to support continual generation of index advice by generating an index advice for an index of a table in a computer database and providing the index advice to index advice record engine (110). Index advice record engine (110) is a module of computer program instructions in the SQL module (112) improved according to embodiments of the present invention to record values of statistical attributes and other attributes of an index advice provided to it by the optimizer (106). In this example, index advice record engine (110) records values of attributes of index advice across repeated generations of an index advice in index advice records (118). Optimizer (106) and index advice record engine (110) work together to implement continual generation of index advice according to embodiments of the present invention so that generating index advice and recording attributes of index advice across repeated generations of the index advice is carried out continually without user intervention.
Continual generation of index advice in accordance with the present invention is generally implemented with computers, that is, with automated computing machinery. For further explanation, therefore,
Stored in RAM (168) is DBMS (105), computer program instructions for database management. The DBMS (105) of
Also stored in RAM (168) is an operating system (154). Operating systems useful in computers according to embodiments of the present invention include UNIX™, Linux™, MicrosoftXP™, AIX™, IBM's i5/OS™, and others as will occur to those of skill in the art. Operating system (154) and DBMS (105) in the example of
Computer (152) of
The example computer of
The exemplary computer (152) of
For further explanation,
The method of
Detecting (302) a need for an index advice may be implemented by detecting by database statistics that an answer set of a query for which there is no matching index is much smaller than the table against which the query is asserted—so that using an index with the query can significantly improve performance of the query. Detecting a need for an index advice may be implemented by heuristics, inferring selectivity of a query from the structure of the query itself. A query on one non-unique field, for example, is likely to select many records. A query with several selection fields is likely to select fewer records. A query on a unique field is likely to select even fewer records. Detecting a need for an index advice may be implemented by determining that a query indicates a need to order a response, but no index matches the requested ordering, such as, for example, when a query contains an ORDER BY clause or a GROUP BY clause but there is no matching index. Other ways of detecting a need for an index advice may occur to those of skill in the art, and all such ways are well within the scope of the present invention.
The method of
Attributes of the index advice that specify an index optionally also may include, depending on the characteristics of the underlying DBMS, the location of the table to be indexed. The location of the table to be indexed may be implemented as a schema name or a library, folder, or subdirectory where the table is stored. The use of a table location to specify an index depends on the characteristics of the underlying DBMS because a particular DBMS may track table locations by table name, for example, with no need for separate specification of a table location.
Attributes of the index advice that specify an index optionally also may include, depending on the characteristics of the underlying DBMS, one or more names of leading order-independent column keys of the table for the index. Leading order-independent column keys are key fields for an index that are useful while analyzing the index advisor information by providing more combinations for indexes to match the queries being executed. By limiting the number of indexes created, the maximum performance benefit may be achieved with the lowest index maintenance cost.
Attributes of the index advice that specify an index optionally also may include, depending on the characteristics of the underlying DBMS, a name of a national language sort sequence table—if one is in use when the index advice is generated, and the location of the sort sequence table if there is one. A sort sequence defines how characters in a character set relate to each other when they are compared and ordered. Different sort sequences are useful for those who want their data ordered for a specific language. For example, lists can be ordered as they are normally seen for a specific language. A sort sequence can also be used to treat certain characters as equivalent, for instance, a and A. It is important to remember that the data itself is not altered by the sort sequence. A weighted representation of the data is used for the comparison. Since a sort sequence table affects the query which generated the index advice, it is useful to understand when a sort sequence table has been used, to maximize the use of any new indexes.
The method of
The method of
In the method of
In the method of
R=((P×C)+E)/C+1,
where R is the current value of the running average, P is a previously calculated and stored value of the running average, C is a cumulative count of the number of times that the index advice has been generated for the index—recorded previously when P was calculated and stored, and E is a current estimate of the time required to execute a query for which the index advice was generated.
The method of
In the method of
In the method of
The method of
In the method of
For further explanation,
The columns of attributes of index advice visible in text box (404) include a column labeled ‘Table’ containing the name of the table for which an index advice was generated. Columns of index advice attributes visible in text box (404) also include a column labeled ‘Columns’ containing the names of columns of the table to be used in forming the advised index, that is, the key columns for the index. Columns of index advice attributes visible in text box (404) also include a column labeled ‘Type’ containing the type of index advised, ‘BR’ representing a binary radix index and ‘EV’ representing an encoded vector index. The contents of columns Table, Columns, and Type represent values of attributes of an index advice that specify an index.
Columns of index advice attributes visible in text box (404) also include a column labeled “Count” containing a cumulative count of the number of times that an index advice has been generated for an index. Columns of index advice attributes visible in text box (404) also include a column labeled “LgEst” standing for ‘Largest Estimate’ and containing a largest estimate of the time required to execute a query for which an index advice was generated. Columns of index advice attributes visible in text box (404) also include a column labeled “AvEst” standing for ‘Average Estimate’ and containing a running average of estimates of the time required to execute a query for which the index advice was generated. The contents of the Count column, the LgEst column, and the AvEst column represent statistical attributes of an index advice.
Columns of index advice attributes visible in text box (404) also include a column labeled “Stamp” containing a timestamp representing the time when an index advice for an index was last generated. The contents of the Stamp column represent attributes of the index advice that characterize usefulness of an index.
Columns of index advice attributes visible in text box (404) also include a column labeled “Time” containing an estimate of the time required to create an index. The contents of the Time column represent attributes of the index advice that characterize cost of creating an index.
In view of the explanations provided in this specification, readers now will recognize that the benefits of continual generation of index advice according to embodiments of the present invention include:
Exemplary embodiments of the present invention are described largely in the context of a fully functional computer system for continual generation of index advice. Readers of skill in the art will recognize, however, that the present invention also may be embodied in a computer program product disposed on signal bearing media for use with any suitable data processing system. Such signal bearing media may be transmission media or recordable media for machine-readable information, including magnetic media, optical media, or other suitable media. Examples of recordable media include magnetic disks in hard drives or diskettes, compact disks for optical drives, magnetic tape, and others as will occur to those of skill in the art. Examples of transmission media include telephone networks for voice communications and digital data communications networks such as, for example, Ethernets™ and networks that communicate with the Internet Protocol and the World Wide Web. Persons skilled in the art will immediately recognize that any computer system having suitable programming means will be capable of executing the steps of the method of the invention as embodied in a program product. Persons skilled in the art will recognize immediately that, although some of the exemplary embodiments described in this specification are oriented to software installed and executing on computer hardware, nevertheless, alternative embodiments implemented as firmware or as hardware are well within the scope of the present invention.
It will be understood from the foregoing description that modifications and changes may be made in various embodiments of the present invention without departing from its true spirit. The descriptions in this specification are for purposes of illustration only and are not to be construed in a limiting sense. The scope of the present invention is limited only by the language of the following claims.