1. Technical Field
The present disclosure relates generally to database information distribution, and more particularly to, distribution of database information in dynamic database system configurations.
2. Related Art
A database is a collection of stored data that is logically related and that is accessible by one or more users or applications. In some database implementations, the amount of information needed for storage and processing may be cyclical in nature such that additional resources may be required for storage and accessing during different periods of time. Database flexibility allows additional resources to be added or removed to accommodate this cyclical nature.
In addressing the need for greater database capacity both for processing and storage, additional resources may be added. However, the addition of resources requires decisions to be made regarding the redistribution of previously-stored database information. For example, each database table should be analyzed to determine if redistribution should be performed. Redistribution of database resources also comes with associated costs regarding the use of the database for the actual redistribution of data. Also, in instances where the addition of resources is temporary, costs of re-redistributing the data prior to the removal of the additional resources may also require consideration. Thus, considerations regarding both the use of data being considered for redistribution and the costs of the actual data redistribution must be accounted for in order to make proper decisions regarding database usage efficiency.
According to one aspect of the disclosure, a database system may implement resource management for redistribution of stored database information in additional database resources. In one example, resource management may include detection of the presence of additional database resources, such as storage and processing resources. Based on conditions associated with the additional resources and database information selected for potential redistribution, the resource management may include a determination as to whether or not redistribution of the database information is to be performed.
According to another aspect of the disclosure, resource management may include a cost-benefit analysis to determine if redistribution of database information is to be performed. The database information may be redistributed when the benefit of redistribution is determined to be greater than the cost.
According to another aspect of the disclosure, the cost-benefit analysis may be associated with the access frequency of the database information under consideration for redistribution as well as the amount of original database resources and the total amount of database resources including the additional database resources. The cost-benefit analysis may also include consideration of other factors related to both the additional resources and the database information being considered for redistribution.
The aspects of the disclosure may be implemented in systems, a methods, or on computer-readable media separately or various combinations thereof.
The disclosure may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
A database management system may determine an adjustment in available resources. Adjustments may include both the addition and removal of database resources. Upon detection of the addition of resources, the database management system may determine if currently-existing database entities (e.g., tables, indices, etc.) are candidates for redistribution to the additional resources. Viable candidate entities for redistribution may be redistributed based on characteristics of both the additional resources and the candidate entities.
During operation the DBMS 100 may communicate through a network 110 to a plurality of processing units (PU) 112. Each processing unit 112 is individually designated as “PU x” with x referring to the particular processing unit 112, such as PU 1 in
In the configuration shown in
In one example, the DSFs 114 of
In storing tables making up a database, the database system 100 may use row partitioning, such that each table is broken down into rows and stored across the data-storage facilities 112. Row partitioning may be used in conjunction with the shared-nothing functionality such that each processing unit 112 is responsible for accessing only the rows of stored tables in the specifically-allocated memory locations. For example, each data storage facility 112 is shown in
The database system 100 may experience an equipment upgrade that provides additional resources 120. Upgrades may be permanent or temporary in nature, with temporary upgrades referring to adding resources with the intention to remove the additional resources after some desired length of time. As an example, in
The DBMS 102 may store a resource management module (RMM) 126. The RMM 126 may be executed by the processor 104. As described herein, “modules” are defined to include software, hardware or some combination thereof executable by one or more physical and/or virtual processors. Software modules may include instructions stored in the one or more memories that are executable by one or more processors. Hardware modules may include various devices, components, circuits, gates, circuit boards, and the like that are executable, directed, and/or controlled for performance by one or more processors.
The RMM 126 may include one or more sub-modules that are executed to perform various tasks. In one example, the RMM 126 may include a database resource detection module (DRDM) 128. The database resource detection module (DRDM) 128 may detect when adjustment of available resources of the database system 100, such as the addition and removal of resources, as indicated by the resource adjustment detection indicator 130. In one example, the resource adjustment indicator 130 may represent a message generated by the network 110 upon connection or removal of the additional resources 120. One or more messages may be generated when multiple resources are added or removed. In alternative examples, the resource adjustment indicator 130 may be any suitable indication that informs the database management system 102 that resources have been added or removed from connection with the network 110. Detection of the addition or removal of resources from the database system 100 may be automatically detected by the DRDM 128 or may be detected based on user indication of the adjustment of resources.
Upon detection of the additional resources 120 being added to the database system 100, the RMM 126 may determine if redistribution of existing database information stored in the database system 100 is desired. Redistribution may be advantageous in allowing additional processing units 122 to process data along with the processing units 112 already available, which may provide faster response times to various queries and/or workloads provided to the database system. Various factors may be evaluated whether redistribution should occur.
In one example, the RMM 126 may include a redistribution evaluation module (REM) 132. The REM 132 may evaluate various factors, such as those related to both the additional resources 120, as well as the characteristics of the database information identified for potential redistribution. Once redistribution is determined, the database information selected for redistribution may be redistributed in accordance with the determination of the REM 132.
In
Upon determination that candidate tables for redistribution exist, the REM 132 may determine information about the database resources, including the additional resources 120 (304). In one example, tables stored in the DSFs 114 and 124 may be hash partitioned data, such as that used in relational tables. In such a configuration, the REM 132 may consider various parameters regarding the resources of the database. For example, the REM 132 may consider the number of processing units 112 and the number of the additional processing units 122. The REM 132 may also consider the duration of time that the additional processing units 122 may be available.
Upon determination of the database resource information, the REM 132 may select a candidate table for redistribution (306). The selection may be a random selection for the tables identified as candidates for redistribution or may be ordered based on various considerations, such as table characteristics, such as size, data type, etc. Upon selection of a candidate table, the REM 132 may determine information about the candidate table (308). The REM 132 may determine various types of information, such as characteristics of the candidate table. In one example, for database tables storing hash partitioned data, the REM 132 may determine information such as: the frequency of index join or full table scan (FTS) access for the candidate table; the cost to read a row of the candidate table; the cost to write a row of the candidate table; and the cardinality of the candidate table. The cost to read a row of the candidate table and write to a row of the candidate table may be based in time, e.g., cost refers to the amount of time it takes to write to or read a row of the candidate table.
Upon determination of the candidate table information, the REM 132 may perform a redistribution analysis to determine if the candidate table should be redistributed to the additional resources 120 (310). In one example, the redistribution analysis may be a cost-benefit analysis, where the cost (in time) of redistribution is compared to the benefit (in time). In one example, the cost associated with redistributing table in the database system 100 may be defined as:
where:
N1=number of processing units in the database system excluding additional processing units;
N2=number of processing units in the database system including additional processing units;
R=cost to read a row to a candidate table;
|T1|=cardinality of the candidate table.
The relationship between N2 and N1 may be represented as:
N2=μ*N1 Eqn. 2
where μ is the ratio of N2/N1. Eqn. 1 assumes that the cost to read a row of the candidate table equals the cost to write a row to the candidate table, however this may be generalized. Thus, costs that arise due to redistributing hash-partitioned data may include reading a candidate table on N1 processing units and redistributing the table. Costs may also arise from, if rows being inserted or replacing other rows are sorted, merging the candidate table on the processing units including the additional processing units—no input/output (I/O) to and from the tables is involved. Costs may also arise from writing the candidate table on N2 processing units.
The benefit associated with redistributing the candidate table may be defined as:
where NA=frequency of index join or FTS access * duration of the expected time the additional resources are available. Accordingly, the REM 132 may determine that redistribution is appropriate when benefit>cost, such that:
which simplifies to:
(N2−N1)*NA>(N1+N2) Eqn. 5
Or
NA>(N1+N2)/(N2−N1) Eqn. 6
N2 may be expressed is μ*N1, where u is the ratio of N2/N1. Substituting Eqn. 2 into Eqn. 5 provides:
NA>(μ+1)/(μ−1) Eqn. 7
Based on the relationship in Eqn. 6, as an example, if the additional resources 120 provided a 10% upgrade of the currently available resources, such as the processing units 112, Eqn. 6 is reduced to NA>21. In other words, the table considered for redistribution would require accessing at least twenty-one times during the upgraded duration for the benefit of redistribution to outweigh the cost.
In one example, a predetermined threshold may be used in order to determine if a table is to be redistributed. The result of Eqn. 6 may be determined for a candidate table resulting in a number of table accesses required for the benefit of redistribution to be greater than the cost. This number of accesses may be compared to a threshold amount. If the number is greater than the threshold amount, the candidate table may be redistributed. The predetermined threshold may be global in nature or a number of predetermined thresholds may be used on a table-by-table basis.
In alternative examples, different data partitioning may be used such as randomly partitioned data. In one example, such a scheme may distribute rows from existing instances to new ones upon upgrade of additional resources to make a copy and discard it when the additional resources are surrendered. The cost may be defined as:
where W=cost to write a row to a candidate table. If it is assumed that writes of a row can proceed simultaneously with reads of rows, the cost becomes:
Cost=R*|T|/N2 Eqn. 9
A cost benefit analysis may be expressed as:
(N2−N1)*R*|T|*NA/(N1*N2)>R*|T|/N2 Eqn. 10
which simplifies to:
(N2−N1)*NA>N1 Eqn. 11
Or
NA>N1/(N2−N1) Eqn. 12
where using the relationship N2=μ*N1 provides:
NA>1/(μ−1) Eqn. 13
Thus, as an example, for a 10% upgrade, a table must be accessed at least 10 times for the redistribution to provide a benefit greater than the cost.
In the randomly-partitioned example, costs double if instead of making a copy, data is re-redistributed back into the original distribution. Such considerations should be weighed against making updates while a copy is extant and additional memory space tradeoffs.
Other alternative examples may implement a hybrid approach, where data is initially maintained using hash partitioning, excepting permanent upgrades, it is cached in a randomly partitioned way. In such a configuration, full table scans can leverage the added parallelism but the index joins and other index accesses can go the original data set.
Upon determining the candidate table is to be redistributed, the table may be redistributed to the additional resources (314). Once redistribution has occurred or the candidate table is not to be redistributed, presence of additional candidate tables is determined (316). If additional candidate tables exist, the next candidate table is selected (318) to determination if redistribution is to occur. If no additional candidate tables exist, the activity by the RMM 126 may cease until being reactivated.
The RBDMS 402 may include one or more processing units used to manage the storage, retrieval, and manipulation of data in the data-storage facilities. The array of processing units may include an array of processing nodes 406 that manage the storage, retrieval, and manipulation of data included in a database. In
In one example, each processing node 406 may include one or more physical processors 408 and memory 410. The processing nodes 406 may include one or more other processing unit arrays such as parsing engine (PE) modules 412 and access modules (AM) 414. The access modules 110 may be access modules processors (AMPs), such as those implemented in the Teradata Active Data Warehousing System®. The parsing engine modules 108 and the access modules 110 may each be virtual processors (vprocs) and/or physical processors. In the case of virtual processors, the parsing engine modules 412 and access modules 414 may be executed by one or more physical processors, such as those that may be included in the processing nodes 406. For example, in
The RBDMS 102 stores data in one or more tables in DSFs 416. The DSF's 416 may be similar to the DSFs 114 described with regard to
Rows of table 3 may be stored across multiple data-storage facilities 416. In
Each parsing engine module 412, upon receiving an incoming database query, may apply an optimizer module (OPT) 424 to assess the best plan for execution of the query. An example of an optimizer module 424 is shown in
The RBDMS 402 may include an active system management (ASM) module 426. The ASM module 426 may be implemented as a “closed-loop” system management (CLSM) architecture capable of satisfying a set of workload-specific goals. In other words, the RBDMS 402 is a goal-oriented workload management system capable of supporting complex workloads and capable of self-adjusting to various types of workloads. The ASM module 426 may communicate with each optimizer module 424 (as shown in
The ASM module 426 operation has four major phases: 1) assigning a set of incoming request characteristics to workload groups, assigning the workload groups to priority classes, and assigning goals (referred to as Service Level Goals or SLGs) to the workload groups; 2) monitoring the execution of the workload groups against their goals; 3) regulating (e.g., adjusting and managing) the workload flow and priorities to achieve the SLGs; and 4) correlating the results of the workload and taking action to improve performance. In accordance with disclosed embodiments, the ASM module 426 is adapted to facilitate control of the optimizer module 424 pursuit of robustness with regard to workloads or queries.
An interconnection 428 allows communication to occur within and between each processing node 406. For example, implementation of the interconnection 428 provides media within and between each processing node 406 allowing communication among the various processing units. Such communication among the processing units may include communication between parsing engine modules 108 associated with the same or different processing nodes 106, as well as communication between the parsing engine modules 412 and the processing modules 414 associated with the same or different processing nodes 406. Through the interconnection 428, the access modules 414 may also communicate with one another within the same associated processing node 406 or other processing nodes 406.
The interconnection 428 may be hardware, software, or some combination thereof. In instances of at least a partial-hardware implementation the interconnection 428, the hardware may exist separately from any hardware (e.g., processors, memory, physical wires, etc.) included in the processing nodes 406 or may use hardware common to the processing nodes 406. In instances of at least a partial-software implementation of the interconnection 428, the software may be stored and executed on one or more of the memories 410 and processors 408 of the processor nodes 406 or may be stored and executed on separate memories and processors that are in communication with the processor nodes 406. In one example, interconnection 428 may include multi-channel media such that if one channel ceases to properly function, another channel may be used. Additionally or alternatively, more than one channel may also allow distributed communication to reduce the possibility of an undesired level of communication congestion among processing nodes 406.
In
Upon detection of the availability of the additional resources, illustrated through additional resource detection 440, the RMM 438 may generate an additional resource detection notification 442, which may be received by the client system 418. In one example, the additional resource detection notification 442 may indicate that the detection of additional resources has occurred and, once transmitted to the client system 418, the RMM 438 may automatically begin analyzing database tables in order to determine if redistribution is to occur. In alternative examples, the RMM 438 may transmit the additional resource detection notification 442 to the client system 418 and await further instruction before proceeding with any determinations regarding table redistribution.
In
The term “memory” or “memories” as used herein may include one or more memories and may be computer-readable storage media or memories, such as a cache, buffer, RAM, removable media, hard drive, flash drive or other computer-readable storage media. Computer-readable storage media may include various types of volatile and nonvolatile storage media. The term “processor” or “processors” as used herein may include various processing techniques that may be implemented by the processors such as multiprocessing, multitasking, parallel processing and the like, for example.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
5557791 | Cheng et al. | Sep 1996 | A |
5574900 | Huang et al. | Nov 1996 | A |
5765146 | Wolf et al. | Jun 1998 | A |
5797000 | Bhattacharya et al. | Aug 1998 | A |
6009265 | Huang et al. | Dec 1999 | A |
6092062 | Lohman et al. | Jul 2000 | A |
6112198 | Lohman et al. | Aug 2000 | A |
6345267 | Lohman et al. | Feb 2002 | B1 |
6505189 | On Au et al. | Jan 2003 | B1 |
7035851 | Sinclair et al. | Apr 2006 | B1 |
7299468 | Casey et al. | Nov 2007 | B2 |
7739308 | Baffier et al. | Jun 2010 | B2 |
7801848 | Hazlewood et al. | Sep 2010 | B2 |
8150836 | Xu et al. | Apr 2012 | B2 |
8176497 | Yang et al. | May 2012 | B2 |
8386540 | McAlister et al. | Feb 2013 | B1 |
20010047482 | Harris et al. | Nov 2001 | A1 |
20020013802 | Mori et al. | Jan 2002 | A1 |
20040022237 | Elliott et al. | Feb 2004 | A1 |
20040221290 | Casey et al. | Nov 2004 | A1 |
20050050050 | Kawamura | Mar 2005 | A1 |
20060069761 | Singh et al. | Mar 2006 | A1 |
20070022133 | Barghouthi | Jan 2007 | A1 |
20070033578 | Arnold et al. | Feb 2007 | A1 |
20070143380 | Plow et al. | Jun 2007 | A1 |
20070143753 | Vasile | Jun 2007 | A1 |
20090183152 | Yang et al. | Jul 2009 | A1 |
20090265306 | Barsness et al. | Oct 2009 | A1 |
20100082540 | Isaacson et al. | Apr 2010 | A1 |