The subject invention relates generally to databases, and in particular to automatic physical database tuning.
Electronic storage mechanisms have enabled accumulation of massive amounts of data. For instance, data that previously required volumes of books for recordation can now be stored electronically without expense of printing paper and with a fraction of physical space needed for storage of paper. In one particular example, deeds and mortgages that were previously recorded in paper volumes can now be stored electronically. Moreover, advances in sensors and other electronic mechanisms now allow massive amounts of data to be collected and stored. For instance, GPS systems can determine location of an individual or entity by way of satellites and GPS receivers, and electronic storage devices connected thereto can then be employed to retain locations associated with such systems. Various other sensors and data collection devices can also be utilized for obtainment and storage of data.
Database systems are often employed for storage and organization of data, wherein such databases can be queried by users to retrieve desirable data. In an exemplary database system, relational databases include redundant structures, such as indexes and materialized views that are employed to correlate rows and columns between disparate tables. The tables and redundant structures are physically created to provide for efficient computation of a given workload (queries and/or updates) while maintaining space constraints. Database systems have been widely deployed and applications associated therewith have become increasingly complex and varied. Conventionally, individuals that designed the database are retained to manage such database and physically alter the database in accordance with a given workload. To reduce expense of staffing a human as a database manager, automatic database tuning systems have been developed to determine a physical configuration of a database given a particular workload, wherein processing and space considerations are balanced.
Complexity of these automatic systems, however, has increased with growth and complexity of database systems. In particular, candidate access paths are heuristically chosen based upon a structure of each input query within a workload, and a “bottom-up” search is performed to identify an optimal physical configuration. In other words, for each query in a workload a set of candidate structures is located, wherein columns that may be useful as index keys and/or sub-expressions have impact upon materialized views are heuristically chosen. Candidate structures are then augmented to improve performance while accounting for space considerations. More specifically, new candidate structures are generated that, while not optimal for a given query, may be beneficial to multiple queries within the workload while reducing space and/or maintaining space in accordance with space constraints. Upon locating one or more valid configurations, the valid configuration (which is empty) is incrementally added or changed until a space constraint is violated. These conventional systems/methods can include various special cases, shortcuts, and heuristics that render such systems difficult to analyze, extract properties, and render it difficult to alter a database system.
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. This summary is not an extensive overview of the invention. It is not intended to identify key/critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description that is presented later.
The subject invention relates to automatically tuning a database system in accordance with a workload. An optimizer is employed to receive and analyze queries and/or updates within the workload and generate an execution plan based thereon, wherein the execution plan can be a most efficient plan over a space of all possible physical configurations of the database. In more detail, the optimizer receives a query and/or update from within the workload and requests indexes and/or materialized views that are optimal with respect to such query and/or update. These physical structures can then be simulated and provided to the optimizer to create a collection of such physical structures. Given these structures, the optimizer can generate an execution plan, and a resulting configuration can be obtained from such execution plan by noting which physical structures are employed by the execution plan. Thus, rather than the optimizer being given a multitude of possible configurations for a particular workload, the optimizer is provided with best configurations for each query and/or update within the workload.
Upon obtaining a configuration based upon the execution plan generated by the optimizer, size of such configuration can be compared with a threshold size (e.g., size of available memory in a database), and if the size is within the threshold, the configuration can be implemented within a database system. If the configuration is associated with a size above the threshold size, such configuration can be relaxed utilizing a variety of relaxing techniques, thereby reducing size associated with the configuration. For instance, indexes and/or materialized views can be merged within the configuration, thereby reducing size. Furthermore, indexes and/or materialized views can be removed from the configuration to reduce size of such configuration. Accordingly, as size of the configuration is reduced, performance associated therewith can likewise be reduced.
Therefore, a particular relaxation technique to employ can be selected and various cost estimates can be generated. In particular, an estimate of amount of space reduced by a proposed relaxation technique can be created. Further, an estimate of decrease in performance associated with a proposed relaxation technique can be generated and analyzed in connection with the estimate of amount of space reduced. A ratio that is indicative of a quality of a relaxation technique can be created and utilized to select a possible relaxation from a plurality of relaxations. Each time a relaxation technique is undertaken upon the configuration, such configuration can be analyzed to determine if it is associated with a size beneath a threshold. In accordance with one aspect of the subject invention, a configuration can be continuously relaxed, and a configuration chain can be analyzed to locate a relaxation associated with a highest cost. Thereafter, disparate relaxation techniques can be employed at such position. Any suitable system and/or methodology for selecting a manner in which to relax the configuration, however, is contemplated and intended to fall under the scope of the hereto-appended claims.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the invention are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the invention may be employed and the subject invention is intended to include all such aspects and their equivalents. Other advantages and novel features of the invention may become apparent from the following detailed description of the invention when considered in conjunction with the drawings.
The subject invention is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject invention. It may be evident, however, that the subject invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the subject invention.
As used in this application, the terms “component” and “system” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.
Furthermore, the subject invention may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed invention. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the subject invention.
The subject invention will now be described with respect to the drawings, where like numeral represent like elements throughout. Referring now to
As used herein, the term “inference” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Various classification schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines . . . ) can be employed in connection with performing automatic and/or inferred action in connection with the subject invention.
The optimizer 102 can optimize a single query and/or update by issuing a plurality of access path requests for indexes and materialized views. For example, an access path generation component associated with the optimizer (not shown) can identify columns in sargable predicates, required sort columns, and columns that are additionally referenced upwards in a query tree. A sargable predicate is a superset of optimizable predicates, which are operators or functions that return a Boolean value. The access path generation component can then analyze available indexes and return one or more alternative physical plans that may be optimal for an input logical sub-query. In general, each generated plan is an instance of a template tree that (i) has one or more index seeks (or index scans) at leaf nodes, (ii) combines leaf nodes by binary intersections or unions, (iii) applies an optional rid lookup to retrieve missing columns, (iv) applies an optional filter operator for non-sargable predicates, and (v) applies an optional sort operator to enforce order. For example, the optimizer 102 can request an index associated with the following sub-query (where τ specifies order):
τD(ΠD,E(σA<10ˆB<10ˆA-C-8(R)))
Utilizing the above sub-query, the optimizer component 102 can identify column A and B in sargable predicates, column D as a required order, and columns E and C as additional columns that are referenced by non-sargable predicates and/or upwards in the template tree. Given this sub-query, the optimizer component 102 can consider a space of various possible plans for available indexes and return a most efficient physical strategy. For example, multiple plans can be proposed, such as one that reviews all indexes, one that avoids intersecting indexes but performs a greater number of rid lookups, and one that analyzes an index covering columns D, A, B, C, and E, and the optimizer component 102 can select the most efficient of the three.
In accordance with an aspect of the subject invention, an analysis can be undertaken each time the optimizer component 102 issues an index or view request, wherein optimization can be temporarily suspended and the request can be analyzed. For instance, all sargable and non-sargable predicates, order, and additional columns can be considered, as well as SPJG sub-queries in view requests. Such requests are utilized to implicitly encode a plurality of physical structures 104-108 that the optimizer component 102 can exploit. While it is shown that the optimizer component 102 can receive a plurality of physical structures, it is understood that the optimizer component 102 can receive any integer number of physical structures. As size associated with these physical structures would be substantial, it may not be possible to provide the optimizer component 102 with the actual structures. Accordingly, the physical structures 104-108 can be simulated in system catalogs and optimization can resume. Once all suitable physical structures are simulated, the optimizer component 102 can output a configuration 110, wherein the configuration 110 can be an optimal configuration, as the optimizer component 102 is provided with all suitable physical structures regardless of size. Such a procedure can be repeated for each index and/or view request, thereby providing the optimizer component 102 with an optimal set of physical structures to implement logical plans, and the configuration 110 can be based at least in part upon such logical plans. More specifically, the configuration 110 is obtained by way of gathering all simulated physical structures 104-108 that are generated during optimization. Since index requests and view requests are intercepted during optimization, candidates that may not be apparent by viewing a final execution plan are not missed or ignored, as can happen when a “bottom-up” approach is employed.
The optimizer component 102 can utilize an input sub-query as an optimal view, since the input of the view request is the sub-query. Specifically, an optimal plan is a scan over any clustered index over such view. The following example is provided to assist in illustration of one exemplary manner of determining an optimal configuration for an index request. An index request (S, N, O, A) can be considered, where S are columns in sargable predicates, N includes a subset of columns in non-sargable predicates, O are columns in order requests, and A are other referenced columns. If there is no order requested (e.g., O=zero), the following lemma can be employed to restrict space of index sub-plans that can be considered.
Lemma 1: For any plan that intersects rids from two index seeks there is a more efficient plan that produces a substantially similar result by seeking one (larger) index.
If, additionally, |S|1=1 and N=zero, the following lemma can be employed.
Lemma 2: For any plan that employs rid lookups over a result of an index seek, there is a more efficient plan that produces a substantially similar result by seeking one (larger) index.
In instances that both of the above lemmas can be applied, the optimal plan does not include index intersections nor rid lookups, and therefore a covering index with key columns S and suffix columns A are sought. If several sargable predicates are present but N=zero, independence between can be presumed between predicates, thereby enabling an optimal plan to include a seek over a prefix of the columns in S sorted by selectivity, followed by a fetch (which can be optional). An optimal index can be efficiently identified by progressively including new columns from S to the index until no further benefit is obtained. In general, if the index request includes non-sargable predicates (e.g., N≠zero), more complexity is introduced as there can be interaction between columns (e.g., a predicate a+b>10 can be evaluated when an index for other sargable predicates over columns a and b is considered).
In another example, a general case of an index request (S, N, O, A) with O≠zero can be considered. If the configuration 110 produces rows in a desired order, the configuration 110 is an optimal plan. Otherwise, a sort operator at the root of this plan can be utilized to obtain an optimal plan that utilizes the sort. It is possible, however, that a disparate plan exists that does not require sorting and is more efficient. To obtain such a plan, an index with O as its key number can be created. If O⊂S, remaining columns in S can be added to the index as key columns and remaining columns in A can be added to the index as suffix columns. Otherwise, all columns in both A and S can be added as suffix columns. This plan is highly efficient and does not require use of a sort operator. Costs of the two alternatives can be compared, and the plan with minimal expected cost can be returned. The resulting configuration 110 is thus an optimal configuration that cannot be further improved for a defined workload. If space associated with the configuration 110 is less than a maximum allowed and the workload does not include updates, the configuration 110 can be returned without further processing and a database system can be configured according to the configuration 110.
If the configuration 110 is above an allowable size, a reduction component 112 can receive the configuration 110 and undertake selective operations upon such configuration 110 to create a reduced configuration 114 that is associated with an acceptable size. For instance, merge operations, reduction operations, split operations, prefixing operations, and clustering operations can be employed to selectively reduce in size the configuration 110. Once the reduced configuration 114 is an acceptable size, it can be returned to the database system as the physical configuration thereof. In more detail, the reduction component 112 receives the configuration 110 (which can be optimal) that is too large to fit into available space, and then progressively transforms the configuration 110 into new configurations that consume less space (but are less efficient) than previous configurations. Such an approach is more efficient than when compared with conventional “bottom-up” approaches. In more detail the configuration C={c1, . . . , cn} can be considered and relaxed into C′ by way of replacing c1 and c2 by c3 (e.g., an index on (a, b) and an index on (a, d) by an index on (a, b, d)). As C′ is composed of less efficient structures than C, it can be determined that a query that does not utilize indexes c1 or c2 in configuration C can remain unchanged in C′. Thus, only queries that employ some of the relaxed structures in C should be subject to re-optimization. In contrast, in a “bottom-up” strategy, adding a new index to an existing configuration requires re-optimization of all queries that reference the index table.
Various estimating and searching algorithms can be employed by the reduction component 112 to progressively reduce size of the configuration 110, thereby creating the reduced configuration 114. For example, for each proposed modification, a ratio of benefits in space against losses in computation time can be computed, and such ratio can be employed to selectively choose operations to be undertaken on the configuration 110. This ratio can be calculated with respect to index transformations, view transformations, and updates (e.g., removals, additions, and/or alterations). In another example, locating each configuration can include selecting a configuration and thereafter comparing benefits (estimated or otherwise) of the configuration against previously obtained configurations. A configuration that is perceived to perform better is retained, and disparate configurations are undertaken and performances associated therewith are estimated until a time constraint is violated. A best configuration at that time (e.g., the reduced configuration 114) is then provided to a database system.
Now turning to
Turning now to
Turning now to
Typically, as the optimizer component 402 is provided with indexes and materialized views that are optimal for each query within a workload, a resulting configuration 410 will be of substantial size (e.g., too large to implement within a database system). A reduction component 412 can be utilized to reduce size of the configuration 410 by way of merging, splitting, reducing, clustering, and prefixing operations. In more detail, the reduction component 412 can include a merging component 414 that is utilized to merge indexes as well as merge views, a splitting component 416 that can rearrange overlapping columns of indexes, a clustering component 418 that can promote an index to a clustered index, a prefixing component 420 that can be employed to prefix indexes, and a removal component 422 that can be utilized to remove indexes from the configuration 410.
With more specificity in regards to the reduction component 412 and components associated therewith, an index I can be associated with a sequence of key columns K and a set of suffix columns S, so that I=(K:S). Further, it can be delineated that if S1 and S2 are sequences S1∩S2 (and similarly S1=S2) can return a sequence that includes elements in an intersection (similarity, differences) of S1 and S2 in a substantially similar order as they appear in S1. Now referring to the merging component 414, merging of indexes I1=(K1:S1) and I2=(K2:S2) can be defined as an index that can answer all requests that either I1 and I2 can individually answer and that can be efficiently sought in cases that I1 can be sought. It is possible, however, that some requests that can be answered by seeking I2 may need to scan the merged index. In still more detail, the merging of indexes I1 and I2 can be defined as I1,2=(K1:(S1∪K2∪S2)−K1). If K1 is a prefix of K2, I1,2 can be defined as (K2:(S1∪S2)−K2). For instance, merging I1=([a,b,c]:{d,e,f}) and I2=([c,d,g]:{e}) results in I1,2=([a,b,c]:{d,e,f,g}). Accordingly, if the configuration 410 (C) is relaxed by way of merging I1 and I2, a reduced configuration 424 (C′) will result and be of the form C′=C−{I1,I2}∪{I1,2}.
As stated above, the reduction component 412 can further include a splitting component 416 that is employed to introduce index intersection plans by way of rearranging overlapping columns of existing (wider) indexes. For instance, if indexes I1=(K1:S1) and I2=(K2:S2) exist, splitting such indexes can produce a common index IC and possibly residual indexes IR1 and IR2. In practice, usages of index I1 (respectively, I2) by a less efficient index intersection between IC and IR1 (respectively, IR2), or rid lookups over IC's result if IR1 (respectively, IR2) does not exist. In particular, IC can be defined as being equal to (KC−K1∩K2:SC−S1∩S2) so long as KC is non-empty (index splits can remain undefined if K1 and K2 have no common columns). In turn, if K1 and KC are different, IR1=(K1−KC,I1−IC), and if K2 and KC are different IR2=(K2−KC,I2−IC). As an example, I1 can be defined as I1=([a,b,c]:{d,e,f}), I2 can be defined as I2=([c,a]:{e}), and I3 can be defined as I3=([a,b]:{d,g}). Splitting I1 and I2 results in IC=([a,c]:{e}), IR1=([b]:{d,f}) and IR2=([d]). Splitting I1 and I3 results in IC=([a,b]:{d}) and IR1=([c]:{e,f}). If the configuration 410 (C) is relaxed by splitting I1 and I2, the reduced configuration 424 (C′) can be defined as C′=C−{I1,I2}∪{IC,IR1,IR2}.
The clustering component 418 can also be employed to analyze the configuration 410 and make alterations to reduce such configuration 410 in size, thereby generating the reduced configuration 424. For instance, the clustering component 418 can promote an index I over a table T within the configuration 424 to a clustered index, so long as the configuration 410 does not include another clustered index over table T.
The prefixing component 420 can also be utilized to reduce size of the configuration component 410 to create the reduced configuration 424. For instance, index I can be defined so that I=(K:S). If a prefix K′ of K is taken (including K′=K if S is not empty), an index IP=(K′,0) can be obtained that can answer arbitrary requests that I answers by optionally performing rid lookups to obtain remaining columns (K−K′)∪S. If the configuration 410 (C) is relaxed by prefixing index I with IP, the reduced configuration 424 (C′) results so that C′=C−{I}∪{IP}. Moreover, the removal component 422 can be employed to remove an index I from the configuration 410 (C), leaving the reduced configuration 424, which can be defined as C′=C−{I}. The reduction component 412 can thus utilize various mechanisms and techniques to progressively reduce the configuration 410 until the resultant reduced configuration 424 lies within a size constraint.
The reduction component 412 can further be employed in connection with transforming views associated with the configuration 410, thereby reducing size of such configuration 410. For example, a view V can be denoted as a 6-tuple view so that V=(S, F, J, R, O, G), where S is a set of base-table or aggregate columns, F is a set of tables, J is a set of equi-join predicates, R is a set of range predicates, O is a conjunction of predicates not in J or R, and G is a set of base-table columns. Furthermore, it is understood that all aforementioned components can be empty except for S and F. For instance, an SQL equivalent for V can be written as follows:
SELECT S
FROM F
WHERE J AND R AND O
GROUP BY G
To further illustrate reduction of views, an SPJG query Q can be considered, where Q is desirably matched with view V=(SV, FV, JV, RV, OV, GV). Q can be rewritten as a 6-tuple query Q=(SQ, FQ, JQ, RQ, OQ, GQ), and a subsumption test can be applied to each pair of components. If all subsumption tests are successful, Q can be rewritten using V. Subsumption tests can vary among systems, wherein completeness and efficiency are balanced. In one example, for Q and V to match, FQ should be equal to FV, as V would have already matched a sub-query of Q during optimization if FV⊂=FQ. Further, OV's conjunctions should be included in OQ's conjunctions. Remaining components can be checked by way of using simple inclusions tests.
In accordance with an aspect of the subject invention, the merging component 414 can be employed to merge views and thus reduce size of the configuration 410. Similar to merging indexes, merging views V1 and V2 can result in a view VM from which all information within V1 and V2 can be extracted. More specifically, VM can be matched during instances that V1 and V2 are matched. To assist in defining view merges, V1 and V2 can be defined so that V1=(S1, F1, J1, R1, O1, G1) and V2=(S2, F2, J2, R2, O2, G2). Furthermore, as a condition for merging, F1 can be required to be equal to F2. Merging of V1 and V2 can be defined as
VM=(SM, FM, JM, RM, OM, GM), where FM=F1=F2, JM=J1∩J2, RM=R1 “merge” R2 (e.g., RM combines same-column range predicates in R1 and R2), OM=O1∩O2 (where the intersection uses structural equality), GM=G1∪G2 if both G1 and G2 are non-empty (if either is empty, GM=0), and SM=S1∪S2 if GM≠0. If GM=0, then SM=S1∪S2−SA∪SA′, where SA is a set of aggregated columns in either S1 or S2 and SA′ is a set of base-table columns in SA. Moreover, if a range predicate in RM becomes unbounded (e.g., after merging R.a<10 and R.a>5), the range predicate can be eliminated from RM. If, however, GM≠0, a corresponding column can be added to GM and SM so that range predicates can be evaluated with VM. To illustrate merging undertaken by the merging component 414, the following example is provided.
After views V1 and V2 are merged into VM, indexes over V1 and V2 can be promoted to VM. In other words, for each index I(K:S) over V1 (respectively, V2) an index IM(K′:S′) can be created, where K′ and S′ include columns in K and S mapped from V1 (respectively, V2) to VM. If the configuration 410 (C) is relaxed by merging V1 and V2 into VM, the reduced configuration (C′) can be created, where C′=C−{V1,V2}−IV
Along with merging, the reduction component 412 can employ the removal component 422 to aid in reduction of size of the configuration 410 to create the reduced configuration 424. For instance, any view V in the configuration 410 (C) can be removed to create the reduced configuration 424 (C′), so that C′=C−{V, I1, . . . , In}, where I1, . . . , In are indexes defined over V.
As the reduction component 412 can utilize various components to reduce size of the configuration 410, it is important that appropriate reductions are taken. For instance, while resulting in a substantially similar size, performing a merge operation on the configuration 410 can result in a much less optimal configuration as compared to performing a split operation on the configuration 410. Accordingly, the reduction component 412 can include a search component 426 that estimates value of performing disparate actions upon the configuration 410. This estimation essentially determines a benefit in space versus a loss in execution time, and can be employed in connection with selecting operations to perform on the configuration 410.
Now turning to
As stated above, the estimating component 508 can estimate an amount of space consumed by the configuration 502. In one example, an index I can be defined over a table T, such that I=(K:S) over such table. To estimate size of the index I, width of an entry in any of I's leaf nodes can be calculated as WL=ΣcεK∪S width(c), where width(c) is a system-dependent constant if c is a fixed-length column (e.g., integers are four bytes long), or is an average length of values of c in a database if c is a variable-length column (c's average length can be approximated by way of sampling). Similarly, width of an entry in an internal node of a B-Tree can be calculated as W1=ΣcεK width(c). WL and WI can then be employed to calculate a number of entries per page in leaf (PL) and internal (PI) nodes of the B-Tree. A total number of pages utilized by I can then be calculated as a sum of pages per level in the B-Tree. For a specific example, leaf nodes in the B-Tree fit in SO=┌|T|/PL┐ pages and level i (i≧1) nodes fit in Si=┌Si-1/P1┐ pages. Accordingly, size of indexes, such as the index 504, within the configuration 502 can be estimated, and such an estimate 510 can be output by the estimating component 508. It is understood that the above-described method for estimating space consumption of an index and/or a materialized view is but one manner of doing so, and other suitable manners are contemplated and intended to fall under the scope of the hereto-appended claims. Size of materialized views (e.g., the materialized view 506) can be estimated in a similar fashion. For instance, a space consumed by the materialized view 506 (V) can be estimated as a sum of sizes of each index (including a clustered index) defined over V. To approximate |V|, a cardinality module of an optimizer (such as the optimizer of
The estimating component 508 can further be employed to estimate cost associated with index transformations and view transformations. In other words, an estimate can be obtained relating to a performance measure that is affected by an index and/or view transformation. This estimated cost can then be analyzed together with estimated reductions in space of the configuration 502 upon undergoing a transformation, and a transformation can be selected by the search component (
Referring now to
Turning solely to
At 606, a physical configuration is determined by analyzing the execution plans. With more particularity, the physical configuration will include physical structures utilized in the execution plans—thus, the physical configuration can be one that is associated with least possible cost given the workload. At 608, the physical configuration can be progressively reduced in size by way of various relaxation techniques, some of which were described in detail in
Now referring to
UPDATE R SET a=b+1, c=c*c+5 WHERE a<10 AND d<20
This update can be separated into a pure query and an update shell, respectively.
SELECT b+1, c*c+5 FROM R WHERE a<10 AND d<20
UPDATE TOP(k) R SET a=0, c=0
At 704, for each query in the workload, an optimizer that receives the query generates requests relating to indexes and views that would be optimal with respect to such query. At 706, again for each query, a simulation of the indexes and views associated with each request can be generated. Thus, a plurality of indexes and views can result for each query. At 708, an execution plan is generated with respect to the query, wherein the execution plan utilizes the simulated indexes and views generated at 706. Thus, for each query, an optimal execution plan can be created. At 710, a determination is made regarding whether there are any more queries and/or updates remaining within the workload. If there are remaining queries and/or updates, the methodology returns to 702. If there are no remaining queries and/or updates within the workload, a configuration can be selected, wherein the configuration is a union of configurations from each of the execution plans that were generated at 708. In summary, if a workload includes ten queries, ten separate execution plans can be obtained (one for each query). A configuration with respect to the workload can then be obtained as a union of configurations from each execution plan.
Now turning to
Referring now to
In order to provide additional context for various aspects of the subject invention,
Generally, however, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular data types. The operating environment 1010 is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Other well known computer systems, environments, and/or configurations that may be suitable for use with the invention include but are not limited to, personal computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include the above systems or devices, and the like.
With reference to
The system bus 1018 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, 8-bit bus, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer Systems Interface (SCSI). The system memory 1016 includes volatile memory 1020 and nonvolatile memory 1022. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 1012, such as during start-up, is stored in nonvolatile memory 1022. By way of illustration, and not limitation, nonvolatile memory 1022 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory 1020 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct Rambus RAM (DRRAM).
Computer 1012 also includes removable/nonremovable, volatile/nonvolatile computer storage media.
It is to be appreciated that
A user enters commands or information into the computer 1012 through input device(s) 1036. Input devices 1036 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, and the like. These and other input devices connect to the processing unit 1014 through the system bus 1018 via interface port(s) 1038. Interface port(s) 1038 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB). Output device(s) 1040 use some of the same type of ports as input device(s)
1036. Thus, for example, a USB port may be used to provide input to computer 1012, and to output information from computer 1012 to an output device 1040. Output adapter 1042 is provided to illustrate that there are some output devices 1040 like monitors, speakers, and printers among other output devices 1040 that require special adapters. The output adapters 1042 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 1040 and the system bus 1018. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 1044.
Computer 1012 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1044. The remote computer(s) 1044 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 1012. For purposes of brevity, only a memory storage device 1046 is illustrated with remote computer(s) 1044. Remote computer(s) 1044 is logically connected to computer 1012 through a network interface 1048 and then physically connected via communication connection 1050. Network interface 1048 encompasses communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 802.3, Token Ring/IEEE 802.5 and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
Communication connection(s) 1050 refers to the hardware/software employed to connect the network interface 1048 to the bus 1018. While communication connection 1050 is shown for illustrative clarity inside computer 1012, it can also be external to computer 1012. The hardware/software necessary for connection to the network interface 1048 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.
What has been described above includes examples of the subject invention. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the subject invention, but one of ordinary skill in the art may recognize that many further combinations and permutations of the subject invention are possible. Accordingly, the subject invention is intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.