The present invention relates generally to digital data processing, and more particularly to the generation and execution of database queries in a digital computer system.
In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises hardware in the form of one or more central processing units (CPU) for processing instructions, memory for storing instructions and other data, and other supporting hardware necessary to transfer information, communicate with the external world, and so forth. From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
The overall speed at which a computer system performs day-to-day tasks (also called “throughput”) can be increased by making various improvements to the computer's hardware design, which in one way or another increase the average number of simple operations performed per unit of time. The overall speed of the system can also be increased by making algorithmic improvements to the system design, and particularly, to the design of software executing on the system. Unlike most hardware improvements, many algorithmic improvements to software increase the throughput not by increasing the average number of operations executed per unit time, but by reducing the total number of operations which must be executed to perform a given task.
Complex systems may be used to support a variety of applications, but one common use is the maintenance of large databases, from which information may be obtained. Large databases usually support some form of database query for obtaining information which is extracted from selected database fields and records. Such queries can consume significant system resources, particularly processor resources, and the speed at which queries are performed can have a substantial influence on the overall system throughput.
Conceptually, a database may be viewed as one or more tables of information, each table having a large number of entries (analogous to rows of a table), each entry having multiple respective data fields (analogous to columns of the table). The function of a database query is to find all rows, for which the data in the columns of the row matches some set of parameters defined by the query. A query may be as simple as matching a single column field to a specified value, but is often far more complex, involving multiple field values and logical conditions. A query may also involve multiple tables (referred to as a “join” query), in which the query finds all sets of N rows, one row from each respective one of N tables joined by the query, where the data from the columns of the N rows matches some set of query parameters.
Execution of a query involves retrieving and examining records in the database according to some search strategy. For any given logical query, not all search strategies are equal. Various factors may affect the choice of optimum search strategy and the time or resources required to execute the strategy.
For example, one of the factors affecting query execution is the sequential order in which multiple conditions joined by a logical operator, such as AND or OR, are evaluated. The sequential order of evaluation is significant because the first evaluated condition is evaluated with respect to all the entries in a database table, but a later evaluated condition need only be evaluated with respect to some subset of records which were not eliminated from the determination earlier. Therefore, as a general rule, it is desirable to evaluate those conditions which are most selective first. Another factor can be the presence of certain auxiliary database structures which may, if appropriately used, provide shortcuts for evaluating a query. One well known type of auxiliary database structure is an index. An index is conceptually a sorting of entries in a database table according to the value of one or more corresponding fields (columns). If a query includes an indexed value as a condition, it may be advantageous to use the index to determine responsive records, rather than examine each record in the applicable table. Query execution may be affected by any number of factors in addition to those described above.
Auxiliary database structures, such as indexes, are typically defined by a database designer, administrator or similar person. A well-designed database typically contains various auxiliary database structures to support query execution or for other purposes. Once defined, these structures are automatically maintained by database management software as changes are made to the underlying database records.
To support database queries, large databases typically include a query engine which executes the queries according to some automatically selected search (execution) strategy, using the known characteristics of the database and other factors. Some large database applications further have query optimizers which construct search strategies, and save the query and its corresponding search strategy for reuse. In such systems, it may be possible to construct and save multiple different query execution strategies for a single query.
Where an auxiliary database structure, such as an index, exists and is useful in executing a query, a query optimizer may construct an execution strategy to take advantage of the existing auxiliary database structure as a shortcut to executing the query. However, in many complex queries, it is desirable or essential to employ some auxiliary database structure in executing the query, but no such defined auxiliary database structure already exists. In these cases, the query optimizer may generate an execution strategy which itself constructs the needed auxiliary database structure in executing the query. An internally constructed auxiliary database structure which is not defined in the database specification, and is therefore generally hidden from users, is referred to herein as a “latent auxiliary database structure”. Conventionally, such latent auxiliary database structures are only for temporary use in executing a single instance of a query. Because the latent auxiliary database structure is not defined in the database specification, and is not maintained by database management software, it is generally deleted after the database query executes.
A query execution strategy which constructs a latent auxiliary database structure for temporary use may itself be saved and reused many times to execute different instances of the same query. Conventionally, such a query execution strategy will reconstruct the latent auxiliary database structure from scratch each time it executes. Although not necessarily recognized in the art, an improvement in execution performance could be obtained if, in selective circumstances, such a latent auxiliary database structure could be saved and reused when another instance of the same query executes. Furthermore, although not necessarily recognized in the art, an additional improvement in execution performance could be obtained if, in appropriate circumstances, such a latent auxiliary database structure could be made available for use in executing queries other than the one for which it was originally constructed.
A latent auxiliary database structure constructed for executing a query is automatically saved for reuse in subsequent execution instances of the same and/or a different query. In general, the latent auxiliary database structure is not maintained concurrently with changes to records in the database tables. If a query execution strategy is subsequently invoked which uses the latent auxiliary database structure, a determination is made dynamically whether to update the previously saved database structure from database changes recorded in one or more change logs, or to rebuild the latent auxiliary database structure from scratch. Where the cost of updating from the change log is deemed less than the cost of a rebuild from scratch, the previously saved latent auxiliary database structure is updated from the log.
In the preferred embodiment, a latent auxiliary database structure can be used by any query execution strategy, and is not limited to use by the strategy which originally generated it. In an alternative embodiment, a latent auxiliary database structure might be used only by the strategy which created it.
In the preferred embodiment, each saved latent auxiliary database structure is associated with one or more query execution strategies which use it. These execution strategies may be different strategies for the same query, or strategies for different queries. Execution strategies are deleted from time to time as new execution strategies and/or queries are constructed, changes are made to the database, etc. When a latent auxiliary database structure is no longer associated with any query execution strategies as a result of such changes, it is automatically deleted.
By saving latent auxiliary database structures without concurrent maintenance in accordance with the preferred embodiment, negligible additional administrative burden is placed on database management software. However, the latent auxiliary structure is available should it be needed. It will only be updated (from the change log), where the cost of doing so is less than that of rebuilding from scratch (as in conventional art),and therefore will only be updated in those cases where an actual cost saving is projected.
The details of the present invention, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which:
Referring to the Drawing, wherein like numbers denote like parts throughout the several views,
One or more communications buses 105 provide a data communication path for transferring data among CPU 101, main memory 102 and various I/O interface units 111-114, which may also be known as I/O processors (IOPs) or I/O adapters (IOAs). The I/O interface units support communication with a variety of storage and 1/0 devices. For example, terminal interface unit 111 supports the attachment of one or more user terminals 121-124. Storage interface unit 112 supports the attachment of one or more direct access storage devices (DASD) 125-127 (which are typically rotating magnetic disk drive storage devices, although they could alternatively be other devices, including arrays of disk drives configured to appear as a single large storage device to a host). I/O device interface unit 113 supports the attachment of any of various other types of I/O devices, such as printer 128 and fax machine 129, it being understood that other or additional types of I/O devices could be used. Network interface 114 supports a connection to an external network 130 for communication with one or more other digital devices. Network 130 may be any of various local or wide area networks known in the art. For example, network 130 may be an Ethernet local area network, or it may be the Internet. Additionally, network interface 114 might support connection to multiple networks.
It should be understood that
Although only a single CPU 101 is shown for illustrative purposes in
Computer system 100 depicted in
While various system components have been described and shown at a high level, it should be understood that a typical computer system contains many other components not shown, which are not essential to an understanding of the present invention. In the preferred embodiment, computer system 100 is a computer system based on the IBM i/Series™ architecture, it being understood that the present invention could be implemented on other computer systems.
Database 202 contains one or more tables 203-205 (of which three are shown in
Database management system 214 provides basic functions for the management of database 202. Database management system 214 may theoretically support an arbitrary number of database tables, which may or may not have related information, although only three tables are shown in
Query optimizer 215 generates query execution strategies for performing database queries. As is known in the database art, the amount of time or resource required to perform a complex query on a large database can vary greatly, depending on various factors, such as the availability of an index or other auxiliary data structure, the amount of resources required to evaluate each condition, and the expected selectivity (i.e., number of records eliminated from consideration) of the various logical conditions. Optimizer 212 determines an optimal execution strategy according to any optimizing algorithm, now known or hereafter developed, and generates an execution strategy, also known as an “access plan” or “plan”, according to the determination. The execution strategy is a defined series of steps for performing the query, and thus is, in effect, a computer program. The optimizer 215 which generates the execution strategy performs a function analogous to that of a compiler, although the execution strategy data is not necessarily executable-level code. It is, rather, a higher-level series of statements which are interpreted and executed by query engine 216.
A query can be saved as a persistent storage object in memory, and can be written to disk or other storage. Once created by optimizer 212, a query execution strategy can be saved with the query as part of the persistent storage object. The query can be invoked, and a saved query strategy re-used (re-executed), many times. For a given query, it is possible to generate and save one, or optionally multiple, query execution strategies, each optimized for different respective conditions. E.g., where a query contains an imported (“host”) variable in one of its conditions, the value of which is specified at the time the query is executed, different query execution strategies could be used for different values of the imported variable. Different query execution strategies might also be used for different environmental parameters under which the query is to be executed. In addition to saving one or more query execution strategies, certain metadata with respect to each query execution strategy can be saved, such as the conditions for which the strategy was generated and historical data concerning the execution of the strategy.
A defined structure such as defined index 206, defined materialized query table 207 or defined histogram 208 is an auxiliary database structure which is specified by a database designer, administrator, or similar person as part of the database definition, i.e. a data structure which is part of the database and is available for use by users accessing the database. Database management system 214 automatically maintains such defined auxiliary database structures concurrently with changes made to records in database tables 203-205. A query execution strategy will often use a defined auxiliary database structure such as index 206 or MQT 207 to execute a query. In some cases, it is necessary or desirable to use such a structure where no defined structure exists. In these cases, the query execution strategy may build a latent auxiliary data structure as an intermediate step in executing the query. Latent index object 212 and latent MQT are two examples of latent auxiliary database structures, it being understood that a latent auxiliary database structure could include other types of objects constructed to execute a query where no suitable defined structure exists, such as a hash table, a sorted list, a simple copy of intermediate query results, or a bitmap of records or record references the corresponds to potentially selected records from a set of selection. Conventionally, latent structures are typically deleted after the query executes. In the preferred embodiment, these latent structures are saved as persistent objects for possible later reuse. Except when being actively used for a query, these latent structures are not concurrently maintained as changes are made to the database table or tables.
The collection of saved queries, query execution strategies, saved latent indexes, MQT's and so forth, and associated data, is loosely referred to as the “plan cache”.
Although one database 202 having three database tables 203-205, one defined index 206, one defined MQT 207 and one defined histogram 208 are shown in
In addition to database management system 214, one or more user applications (not shown) may access data in database tables 203-205 to perform tasks on behalf of one or more users, and may use defined auxiliary database structures to do so. Such user applications may execute on computer system 100, or may access the database from remote systems. Such user applications may include, e.g., personnel records, accounting, code development and compilation, mail, calendaring, or any of thousands of user applications.
Various software entities are represented in
While the software components of
Each execution strategy block 302, 303 contains data relating to a particular execution strategy for the query. As is known in the art of database management, the choice of an optimal query execution strategy could depend in numerous factors, including the resources allocated to a particular user or process invoking a query, the values of imported variables within the query, the state of the system, and so forth. Query optimizer 216 can generate, and database manager 214 can save, multiple query execution strategies for a given query, each appropriate for use under a different respective set of conditions. Each execution strategy block 302 corresponds to a respective execution strategy for the query.
In general, an execution strategy block 302, 303 contains a strategy header portion 314, 315 comprising various data for use by the data management system in selecting or maintaining the strategies. For example, a strategy header may contain host variable, environmental parameter or other execution conditions governing use of the corresponding strategy to enable the data management system to determine whether the strategy should be selected for executing a particular instance of the query; historical performance statistics of the corresponding strategy; and other data. Among the data included in the strategy header is a reference 316, 317 to any latent objects 212, 213 which are accessed by the corresponding strategy during execution. A single strategy could reference zero, one or multiple latent objects.
Each strategy block 302, 303 further contains a respective set of strategy instructions 318, 319 for executing the corresponding strategy. In the preferred embodiment, these are not directly executable code, but are higher-level instructions which are interpreted by the query engine 216 to execute the query. These instructions determine the order of searching certain tables, whether or not indexes or other auxiliary data structures are used to search the database records, and so on. In particular, these instructions may access latent objects 212, 213 in order to execute the query. The referencing of latent objects during execution is described in greater detail below with respect to
A database index object 212 corresponds to a particular database table, and provides a sorted ordering of the records in that database table according to some logical criterion. Typically, an index sorts the database according to the value of a specific field, and where there are multiple occurrences of the same value, may use one or more additional fields to sort among those multiple values. However, an index could use some other sorting criterion, such as a mathematical function of multiple field values. A database may have many indexes, each being used to sort the database records according to some different criterion, such as different field values.
Each index object contains a header 401 and a plurality of entries 402, each entry 402 corresponding to a respective entry or row of a database table to which the index corresponds. Header 401 contains information useful for using and maintaining the index object, such as an identification of the table to which it corresponds, object boundaries, and so forth. In particular, in the case of latent index objects 212, header 401 contains a strategy count 403. Strategy count 403 is an integer recording the number of query strategies which access the latent index object, and is used for maintenance purposes to delete objects no longer needed, as explained further herein. Each entry 402 contains a respective reference 404 and one or more values 405 (of which one is shown in
A materialized query table (MQT) object 213 is conceptually a table representing the results of a hypothetical defining query. The defining query could be a query against information in a single database table, or could be a join of information in multiple database tables. The hypothetical defining query is often a sub-part of multiple more complex queries, which may use the materialized query table for execution.
A representative materialized query table object 213 shown in
Each entry or row 402 of MQT object 213 contains a one or more respective references 415, 416 and values 417, 418, the number of each depending on the defining query. In the simple case where the defining query operates on a single database table 203, each row of the MQT represents an entry from that single database table which satisfies the conditions of the defining logical query. In the case of a defining query which joins a set of multiple database tables, each row of the MQT represents a set of entries from each of the set of multiple database tables joined by the query, the set of entries satisfying the conditions of the defining query.
Among the functions supported by database management system 214 is the making of queries against data in database 202, which are executed by query engine 216. As is known, queries typically take the form of statements having a defined format, which test records in the database to find matches to some set of logical conditions. Typically, multiple terms, each expressing a logical condition, are connected by logical conjunctives such as “AND” and “OR”. Many queries use existing defined auxiliary database structure such as indexes, materialized query tables and the like to reduce the scope of execution activity. Because database 202 may be very large, having a very large number of records, and a query may be quite complex, involving multiple logical conditions, a suitable index or similar structure does not always exist for a given need. It is often helpful to generate some index or other auxiliary database structure, which is not defined in the database, solely for the purpose of executing the query. In conventional systems, such structures are typically used once and deleted after use.
In accordance with the preferred embodiment of the present invention, latent indexes, materialized query tables or other auxiliary database structures generated for the purpose of executing a query are saved for possible re-use as part of plan cache 209. If another instance of the same or a different query subsequently executes, which would otherwise generate the same latent auxiliary data structure for its use, a determination is made whether to update the previously generated and saved latent auxiliary database structure as an alternative to generating a new one. This decision will depend on various factors, but generally it depends on the volume of intervening change activity. If the cost of updating the previously generated latent structure is projected to be less than that of generating a new structure, the previously generated structure is updated for use in the new query instance.
For a new query, a requesting user formulates and submits a database query using any of various techniques now known or hereafter developed (step 501). E.g., the database query might be constructed and submitted interactively using a query interface in database management system 214, might be submitted from a separate interactive query application program, or might be embedded in a user application and submitted by a call to the query engine 216 when the user application is executed. A query might be submitted from an application executing on system 100, or might be submitted from a remote application executing on a different computer system. In response to receiving the query, the system parses the query into logical conditions to generate a query object (step 502), which may be saved for re-use.
The system invokes optimizer 215 to generate an optimized execution strategy block for the query. Optimizer 215 determines an optimum execution strategy for the given query using any conventional technique or any technique hereafter developed, and may take into account imported variable values, environmental parameters, or other data. In particular, in at least some cases optimizer 215 may determine that, during execution a latent auxiliary database structure, such as index 212 or materialized query table 213, should be constructed as an intermediate step in executing the query. The process of generating an execution strategy is represented in
Where an existing query is re-used, a requesting user selects the existing query object for re-use and invokes it, using any of various techniques now known or hereafter developed (step 504). E.g., the query might be selected interactively from a menu in database management system 214, might be submitted from a separate interactive application program, or might be embedded in a user application and submitted by a call to the query engine 216 when the user application is executed, any of which might be performed from system 100, or from a remote system. Re-using an existing query may require specifying one or more imported variable values or other conditions of execution
In response to invoking the query, query optimizer 215 determines whether a saved strategy exists in the query object 210 (step 505). If no such strategy exists (the ‘N’ branch from step 505), the optimizer generates one (step 503), as in the case of a new query. If a previously saved execution strategy exists for the query (the ‘Y’ branch from step 505), the optimizer determines whether the saved execution strategy is suitable for use under the execution parameters of the current query instance (step 506). This determination may be made using any appropriate technique, now known or hereafter developed, but in general the optimizer accesses certain data in header 314 associated with the query execution strategy, which were saved when the strategy was initially generated, to determine whether the existing strategy can be re-used for the current execution parameters (e.g., imported variable values, system configuration, database configuration etc. If the saved execution strategy is not suitable for use in the current query instance, then the ‘N’ branch is taken from step 506, and the database management system looks for another previously saved execution strategy (step 507), continuing then to step 505. The database management system continues to look for execution strategies (loop at steps 505-507) until a suitable strategy is found (the ‘Y’ branch from step 506) or there are no more strategies (the ‘N’ branch from step 505).
If a suitable execution strategy is found, the ‘Y’ branch is taken from step 506, and the execution strategy is selected (step 508). Where multiple execution strategies are permissible (multiple strategies satisfy their respective logical conditions), the database manager will choose one of these multiple strategies. Such a choice could be based on priorities, or any criteria or technique now known or hereafter developed, or could be arbitrary. After selecting a strategy, the database management system proceeds to step 509.
The query engine is then invoked to execute the query according to the query execution strategy which was either generated at step 503 or selected at step 508. Generally, this means that the query engine retrieves selective database records according to the query execution strategy, and evaluates the logical query conditions with respect to the selected record in an order determined by the strategy. E.g., for a conjunction of logical ANDs, each successive condition is evaluated until a condition returns “false” (which obviates the need to evaluate any further conditions) or until all conditions are evaluated. In particular, the query engine may use a latent auxiliary database structure if the chose execution strategy so requires. Execution is represented in
The query engine then generates and returns results in an appropriate form (step 510). E.g., where a user issues an interactive query, this typically means returning a list of matching database entries for display to the user. A query from an application program may perform some other function with respect to database entries matching a query.
At some point, some stage or steps(s) may require the use of an auxiliary data structure such as an index, represented as the ‘Y’ branch from step 602. If a defined auxiliary data structure of the required type exists (i.e., one which is part of the database specification, and which is regularly maintained by the database management system 214), the ‘Y’ branch is taken from step 603 the defined auxiliary structure is used for implementing the stage or step(s) of query execution (step 604). If no such defined auxiliary structure exists, the ‘N’ branch is taken from step 603. In this case, if a latent auxiliary database structure of the required type already exists (i.e., one which was previously created as a result of executing another execution strategy, either for the same query or a different query), then the ‘Y’ branch is taken from step 605, and the existing latent auxiliary database structure is used for implementing the stage or step(s) of query execution (step 606). If neither a defined structure nor a latent structure of the required type already exists, the ‘N’ branch is taken from step 605, and a new latent auxiliary database structure of the required type is tentatively created (or assumed) for purposes of implementing the state or step(s) of query execution (step 607). In the case of either using an existing latent structure (step 606) or creating a new latent structure (step 607), additional steps are added to the stage or step(s) of query execution to assure that the latent structure is current (step 608). I.e., unlike the case of a defined auxiliary structure, which is maintained automatically by the database management system, a latent structure is not automatically maintained, and generally will not be current. In order to assure that the latent structure is current at the time of query execution, it will either have to be updated using the change logs or rebuilt at execution time.
The query optimizer may then return to step 601 to construct more stages or steps (the ‘Y’ branch from step 609). The optimizer may construct additional stages or steps to perform some different part of the query execution task, or may construct additional stages or steps as alternatives to stages or steps already constructed. When finished constructing stages or steps of execution, the ‘N’ branch is taken from step 609. The optimizer may thus use an arbitrary number of defined and/or latent database structures for executing any given query.
At some point, the optimizer will evaluate different stages or steps, and choose a best path (i.e., sequence of stages or steps) for executing the query (step 610). Evaluation and choice of path is performed by conventional optimizers, and any conventional technique, or any technique developed in the future, may be used for choosing a best path. In evaluating a best path, the optimizer will generally consider the cost of updating a latent auxiliary database structure. I.e., it may choose to use an alternative strategy which does not require the latent auxiliary structure because the projected cost of updating the structure outweighs the benefits of its use. The “cost”, as used herein, could be any appropriate measure of resource cost, and may take into account factors such as CPU utilization, storage or other I/O operations, response time, or other measures of “cost”.
In evaluating cost using conventional optimization techniques, the cost of any path requiring a latent auxiliary structure will include the cost of building the latent auxiliary structure from scratch, because it is assumed that such a structure is for one-time use only with each query execution instance, and will be deleted after the query executes. I.e., the cost of Plan PL (CostPL) which uses latent auxiliary object L can be generally expressed as:
CostPL=Cost_Exec—L+Cost_Build—L (1)
where Cost_Build_L is the cost of building latent auxiliary object L and Cost_Exec_L is the cost of executing plan PL using latent auxiliary object L, once it is available. It is possible to use such an evaluation technique in accordance with the preferred embodiment of the present invention. However, this must be viewed as a worst-case cost measure, because it is hoped that in at least some cases latent auxiliary database structure L will be re-used. Therefore it may alternatively be possible to amortize the cost of building the latent auxiliary structure over multiple projected execution instances of the query. Such as cost evaluation technique can be generally expressed as:
CostPL=Cost_Exec—L+(Cost_Build—L+Cost_Update—L)/#Exec (2)
where #Exec is the number of projected execution instances and Cost_Update_L is the projected cost of updating auxiliary object L over the projected execution instances. Although equation (2) is theoretically more accurate, it may be difficult to obtain accurate projections of the number of execution instances, and inaccurate projections could lead to unnecessary creation and proliferation of latent auxiliary objects. Projections could be obtained by recording the number of execution instances in a sample time interval, and periodically re-evaluating query plans to determine whether the frequency of execution justifies construction of a latent auxiliary object.
Once a best path sequence has been chosen from the multiple possibilities, the optimizer generates a set of execution strategy instructions for implementing the best path sequence of stages or steps (step 611); these execution strategy instructions are used by the query engine 216 when executing the query.
If the execution strategy thus chosen and constructed requires the use of a latent auxiliary database structure object (the ‘Y’ branch from step 612), then, with respect to each such latent structure, if the latent object already exists (the ‘Y’ branch from step 613), the strategy reference count 403, 413 in the latent objects's header 401, 411 is incremented by one to reflect that one more strategy uses the corresponding latent object (step 615); and if the latent object does not already exist (the ‘N’ branch from step 613), a corresponding latent object of minimal size, generally including only a header or portion of a header, is allocated, the strategy reference count being initialized to one (step 614).
The set of execution strategy instructions generated at step 611 are saved as a new strategy block 302, 303 in the query object, along with any required header information (step 616). If any latent objects are used by the strategy, appropriate references 316, 317 to the latent objects are inserted into the strategy block header 314, 315.
Referring to
If, at step 701, the instruction requires access to a latent object, the ‘Y’ branch is taken from step 701, and the corresponding latent object is accessed to determine the time at which it was last updated to a current state (step 703). Appropriate database change logs or other structures are then accessed estimate a cost of updating the latent object from changes recorded in the change log and a cost of rebuilding the latent object from scratch (step 704). I.e., the existing latent object, which is typically not current because it is not regularly maintained, can be updated to a current state in which it accurately reflects the current contents of the database tables by accessing one or more change logs which records changes to the database table contents, and processing each change occurring after the latent object was last updated to modify the latent object accordingly. Alternatively, the existing latent object can be simply discarded, and a new latent object rebuilt from scratch by examining the current database table or tables. Where the latent object was last updated relatively recently and the number of database changes since update is not large, it will generally be less costly to update the existing latent object from the change logs. As the length of time since update and the number of changes increases, the cost of updating the existing latent object from the change logs increases correspondingly. At some point, the changes will be so numerous that it will be less costly to reconstruct the latent object from scratch using the current database tables, without reference to the change logs. Various conventional techniques exist for estimating a resource “cost” associated with updating an auxiliary database structure from a change log as well as for rebuilding an auxiliary database structure from scratch, and any of these techniques, or any technique hereafter developed, could be used.
If the projected cost of rebuilding is less than the projected cost of updating from the change logs (the ‘Y’ branch from step 705), the latent object is rebuilt from scratch using the current database table or tables, the rebuilt object replacing the existing latent object (step 706). If the projected cost of rebuilding is not less than the projected cost of updating from the logs (the ‘N’ branch from step 705), the change entries in the change log(s) are accessed and the existing latent object is updated to conform to the change entries in the change logs, bringing it to a current state (step 707). In either case, the rebuilt or updated latent object is returned for use in executing the query (step 708).
Steps 703-708 could be implemented in either the optimizer or the query engine. I.e., steps 703-708 could be performed by the query engine responsive to explicit instructions contained in the execution strategy which are placed there by the optimizer, in which case the query engine might not itself require any modification to implement the present invention according to its preferred embodiment. Alternatively, the query engine could interpret an instruction in an execution strategy requiring access to a latent object to require that the latent object be brought current as explained above; in this alternative, the optimizer would simply insert an instruction to access the latent object, and the query engine would be responsible for interpreting the instruction to require appropriate updating or rebuilding of the object as described.
The strategy reference count 403, 413 in the header of a latent auxiliary database structure (object) reflects the number of strategies which access that latent object. As explained above, each time a new strategy is created which accesses the latent object, the strategy reference count is incremented. By the same token, if a strategy is deleted for any reason, the corresponding strategy reference count of any latent object accessed by the deleted strategy is decremented by one. Strategies may be deleted for any of various reasons, e.g., a strategy may be superseded by a new strategy constructed by the optimizer; a query for which the strategy was constructed might be explicitly deleted by a user; etc. If a latent object's strategy reference count is decremented to zero, then it is known that no more strategies use that latent object. In this case, the latent object itself will be automatically deleted by the database manager.
Among the advantages of the technique described herein as a preferred embodiment is that no additional burden is assumed in maintaining various latent auxiliary database structures, unless some cost benefit can be projected therefrom. A projected cost analysis, based on actual number of changes in the change logs and size of the database tables, should be reasonably accurate. Only if there is a cost savings will a latent object be updated; in all other cases, the latent object will be rebuilt as in conventional art.
In the preferred embodiment described above, the generation and execution of the query, and the use of latent auxiliary database objects, is described as a series of steps in a particular order. However, it will be recognized by those skilled in the art that the order of performing certain steps may vary, and that variations in addition to those specifically mentioned above exist in the way particular steps might be performed. In particular, the manner in which queries are written, parsed or compiled, and stored, may vary depending on the database environment and other factors.
In general, the routines executed to implement the illustrated embodiments of the invention, whether implemented as part of an operating system or a specific application, program, object, module or sequence of instructions, are referred to herein as “programs” or “computer programs”. The programs typically comprise instructions which, when read and executed by one or more processors in the devices or systems in a computer system consistent with the invention, cause those devices or systems to perform the steps necessary to execute steps or generate elements embodying the various aspects of the present invention. Moreover, while the invention has and hereinafter will be described in the context of fully functioning computer systems, the various embodiments of the invention are capable of being distributed as a program product in a variety of forms, and the invention applies equally regardless of the particular type of signal-bearing media used to actually carry out the distribution. Examples of signal-bearing media include, but are not limited to, volatile and non-volatile memory devices, floppy disks, hard-disk drives, CD-ROM's, DVD's, magnetic tape, and so forth. Furthermore, the invention applies to any form of signal-bearing media regardless of whether data is exchanged from one form of signal-bearing media to another over a transmission network, including a wireless network. Examples of signal-bearing media are illustrated in
Although a specific embodiment of the invention has been disclosed along with certain alternatives, it will be recognized by those skilled in the art that additional variations in form and detail may be made within the scope of the following claims: