It is sometimes useful to emulate the operation of one database system, i.e., a target database system, on a test database system. Such emulations can be helpful in troubleshooting the processing of requests, i.e., queries and utilities, and in testing proposed hardware or software changes without interrupting operations on the target database system.
The test database system often uses an optimizer to generate parallel query plans or strategies for executing a request and then to select the plan to execute. Such optimizers often use table statistics, which are typically stored in a data dictionary, in their analysis. One conventional approach for providing such data to an optimizer in a test database system is to extract table statistics from the target database system's data dictionary, store the table statistics in the test database system's data dictionary, and run the emulation on the test database system. Using such an approach when the test system is being used to emulate more than one target database system is a challenge especially when there are many users logged on to the test system.
In general, in one aspect, the invention features a method comprising reading statistics from a data dictionary on a target system. The statistics include statistics for a table in a database in the target system. The method includes exporting the statistics to a test system separate from the target system. The method further includes receiving a command to use the statistics in emulation and, in response copying the statistics to a cache in the test system and emulating the target system on the test system using the statistics in the test system's cache.
Implementations may include one or more of the following. Receiving a command to use the stored statistics in emulation may include receiving a command specifying a level of emulation. Copying the statistics to a cache in the test system may include copying the statistics to a segment in the cache in the test system corresponding to the level of emulation. Emulating the target system on the test system using the statistics in the test system's cache may include emulating the target system on the test system using the statistics in the segment in the cache corresponding to the level of emulation. Receiving a command to use the stored statistics in emulation may include receiving a command in which the specified level of emulation is selected from a group of levels of emulation consisting of system level, interface processor level, session level, and request level. Receiving a command to use the stored statistics may include receiving the command from a first user. Copying the statistics to the segment in the cache in the test system corresponding to the level of emulation may include copying a private version of the statistics to the segment in the cache in the test system corresponding to the level of emulation, the private version of the statistics being accessible only by the first user and not by any other user. Emulating the target system on the test system may include using the private version of the statistics in the segment in the cache corresponding to the level of emulation. Emulating the target system on the test system using the private version of the statistics in the segment in the cache corresponding to the level of emulation may include optimizing the processing of a first request submitted by the first user using the private version of the statistics and optimizing the processing of a second request submitted by a second user using statistics other than the private version of the statistics. Receiving a command to use the stored statistics in emulation may include receiving a command specifying the database and the table. Copying the statistics to a cache in the test system may include copying the statistics for the table to the cache. Emulating the target system on the test system using the statistics in the test system's cache may include emulating the target system on the test system using the statistics for the table in the cache. Receiving a command to use the stored statistics in emulation may include receiving a command from a user specifying the database, the table, and a level of emulation. Copying the statistics to a cache in the test system may include copying the statistics for the table to a segment in the cache corresponding to the level of emulation. Emulating the target system on the test system using the statistics in the test system's cache may include emulating the target system on the test system using the statistics for the table in the segment in the cache corresponding to the level of emulation.
In general, in another aspect, the invention features a database system comprising one or more nodes, a plurality of CPUs, each of the one or more nodes providing access to one or more CPUs. The database system further includes a plurality of virtual processes, each of the one or more CPUs providing access to one or more virtual processes. Each virtual process is configured to manage data, including rows from the set of database table rows, stored in one of a plurality of data-storage facilities. The database system further includes a process configured to emulate a target system on a test system by reading statistics from a data dictionary on a target system. The statistics include statistics for a table in a database in the target system. The process further includes exporting the statistics to a test system separate from the target system. The process further includes receiving a command to use the statistics in emulation and, in response copying the statistics to a cache in the test system and emulating the target system on the test system using the statistics in the test system's cache.
In general, in another aspect, the invention features a computer program, stored in a computer-readable tangible medium. The computer program includes executable instructions that cause a computer to read statistics from a data dictionary on a target system. The statistics include statistics for a table in a database in the target system. The computer program includes executable instructions that cause a computer to export the statistics to a test system separate from the target system. The computer program includes executable instructions that cause a computer to receive a command to use the statistics in emulation and, in response copy the statistics to a cache in the test system and emulate the target system on the test system using the statistics in the test system's cache.
In one embodiment, shown in
In one embodiment, the target database system 110A is located at a first customer site, while the target database system 110B is located at a second customer site. In one embodiment, other target database systems, represented by the ellipsis 150, are located at other customer sites. Each target database system 110 includes database management software 112 that manages access of data in a respective database 114. In accordance with some embodiments, the database 114 is distributed across plural access modules and corresponding storage modules in each target database system 110. For example, if the target database system 110A is a TERADATA® database system from Teradata Corporation, then the access modules are Access Module Processes (AMPs). The target database system includes one node or plural nodes, with each node capable of executing one or plural access modules.
In one embodiment, each target database system 110 includes a data dictionary 116, which includes statistics 118. In one embodiment, the statistics 118 include data regarding tables managed by the database management software 112 such as:
In one embodiment, each target database system 110 includes a cache 119. In one embodiment, the cache 119 is a relatively fast memory where the database management software 112 looks for data that has been recently accessed before searching relatively slow mass storage devices, such as disk drives or optical drives.
As mentioned above, in one embodiment, the target database systems 110 are coupled to a data network 140. In one embodiment, the data network 140 is a private network (such as a local area network or wide area network). In one embodiment, the data network is a public network (such as the Internet). In one embodiment, the emulation client system 130 is also coupled to the data network 140. In one embodiment, various data files 134 stored in the emulation client system 130 contain statistics data exported from respective database systems 110. In one embodiment, the system emulation tool 132, executable in the emulation client system 130, exports statistics data from the target database systems 110A, 110B over the data network 140. In one embodiment, the system emulation tool imports the statistics data into the test system 120. In one embodiment, if the client emulation system 130 is integrated with the test database system 120, the system emulation tool 132 and data files 134 are located on the same platform as the components of the test system 110. In one embodiment, the statistics data imported to the test system 120 are imported into the query capture database (QCD) 121 as private versions of statistics 122. In one embodiment, each target database system 110 being emulated has its own private version of statistics 122 stored in the QCD 121.
In one embodiment, the test database system 120 includes a data dictionary 123 that contains statistics 124. In one embodiment, the test database system's QCD 121 is a separate database from its data dictionary 123. In one embodiment, the statistics 124 contain statistics such as those described above with respect to element 118 collected for tables stored on the test database system 120.
In one embodiment, database management software 125 in the test system 120 includes an optimizer, which selects a query plan for a given query that accesses data in a database 126 managed by the database management software 125. In one embodiment, the optimizer uses the statistics 124 stored in the data dictionary 123 to identify and select the most efficient query plan (from among plural query plans) for a given query. In one embodiment, the optimizer checks a cache 127 for recently accessed data before accessing the statistics 124 in the data dictionary 123.
In one embodiment, the test database system 120 includes a visual explain and compare module 128 for displaying query plan(s) selected by the optimizer in a user interface of the test system. In one embodiment, instead of executing in the test database system 120, the visual explain and compare module 128 executes in the emulation client system 130, as represented by the display 136 and graphical user interface (GUI) 138. In another embodiment, instead of using the visual explain and compare module 128, some other user interface is provided to present the results of query plan selection by the optimizer. In one embodiment, if the visual explain and compare module 128 is used, the graphical user interface 138 is provided, in which steps of the query plan are depicted as icons or other image elements, with icons connected by lines to represent the flow of the steps in the query plan. The icons that are displayed by the visual explain and compare module 128 are designed to represent database operations performed on data rows such as relational algebra operations (e.g., select, project, join); physical algebraic operators such as nested join, merge join, hash join, and so forth; data movement/organization operations such as sorts, redistribution, duplication; and database objects such as tables and views.
Another feature of the visual explain and compare module 128, in accordance with one embodiment, is the ability to graphically or visually compare plural query plans for a given query, with the plural query plans generated under different conditions. This provides the user with the ability to analyze differences between query plans.
Returning to the sample architecture shown in
For the case in which one or more virtual processors are running on a single physical processor, the single physical processor swaps between the set of N virtual processors.
For the case in which N virtual processors are running on an M-processor node, the node's operating system schedules the N virtual processors to run on its set of M physical processors. If there are 4 virtual processors and 4 physical processors, then typically each virtual processor would run on its own physical processor. If there are 8 virtual processors and 4 physical processors, the operating system would schedule the 8 virtual processors against the 4 physical processors, in which case swapping of the virtual processors would occur.
Each of the processing modules 2101 . . . N manages a portion of a database that is stored in a corresponding one of the data-storage facilities 2201 . . . N. Each of the data-storage facilities 2201 . . . N includes one or more disk drives. The DBS may include multiple nodes 2052 . . . N in addition to the illustrated node 2051, connected by extending the network 215.
The system stores data in one or more tables in the data-storage facilities 2201 . . . N. The rows 2251 . . . Z of the tables are stored across multiple data-storage facilities 2201 . . . N to ensure that the system workload is distributed evenly across the processing modules 2101 . . . N. A parsing engine 230 organizes the storage of data and the distribution of table rows 2251 . . . Z among the processing modules 2101 . . . N. Similar to AMPs, the system can be configured to support a large number of parsing engines. Parsing engines are referred to as interface processors (IFPs). The parsing engine 230 also coordinates the retrieval of data from the data-storage facilities 2201 . . . N in response to queries received from a user at a mainframe 235 or a client computer 240. The target database systems 110 and/or the test database system 120 usually receive queries and commands to build tables in a standard format, such as SQL.
In one implementation, the rows 2251 . . . Z are distributed across the data-storage facilities 2201 . . . N by the parsing engine 230 in accordance with their primary index. The primary index defines the columns of the rows that are used for calculating a hash value. The function that produces the hash value from the values in the columns specified by the primary index is called the hash function. Some portion, possibly the entirety, of the hash value is designated a “hash bucket”. The hash buckets are assigned to data-storage facilities 2201 . . . N and associated processing modules 2101 . . . N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed.
In an example system, the parsing engine 230 is made up of three components: a session control 300, a parser 305, and a dispatcher 310, as shown in
Once session control 300 allows a session to begin, a user may submit a SQL request, which is routed to parser 305. As illustrated in
In use, in one embodiment, as shown in
In one embodiment, the statistics are exported from the target database systems 110 to the client emulation system 130, as indicated by the arrows from the target database systems 110A, 110B to the client emulation system 130 (block 510). In one embodiment, other information, including DDL statements, DML statement, cost-related information, actual data associated with the database 114, and/or random samples of that data are exported from the target database systems 110A, 110B to the client emulation system 130, as described in U.S. Pat. No. 7,155,428. In one embodiment, the exporting process is coordinated by the system emulation tool 132. In one embodiment, the exporting process can be controlled and monitored through the GUI 138 on the display 136.
In one embodiment the statistics are stored in the QCD 121 in the test database system (block 515). In one embodiment, a separate private version of the statistics 122 is maintained for each target database system 110 being emulated.
In one embodiment, upon command, the private version of the statistics for a target database system specified in the command is copied into a cache 127 in the test database system 120 (block 520). The command is described in more detail below.
In one embodiment, the test database system 120 emulates the target database system 110 (block 530). When the test database system's database management software 125 attempts to read the statistics 124 from its data dictionary 123 it will first check the cache 127, where it will find the statistics for the target database system 110. It will use those statistics rather than those stored in the data dictionary 123.
As a result, in one embodiment, the statistics 124 in the test database system's data dictionary 123 are not disturbed. Further, in one embodiment, each simulation of a target database system 110 can proceed with its own set of statistics without disturbing the statistics for one or more of the other target database systems, which maintains the integrity of the system when multiple users are logged on to the test system. That is, using this technique, the emulations for one user need not be effected by emulations for another user.
In one embodiment of block 520, shown in
One embodiment of the syntax of the command is shown in
In one embodiment, the variable “target_system_name” is the name of the target system to be emulated. In one embodiment, the variable “table_name” is the name of the table for which statistics emulation is to be performed. In one embodiment, the variable “database_name” is the name of the database in which the table name is defined.
In one embodiment, the cache 127 is segmented into 4 levels of segments, as shown in
In one embodiment, issuing the following command:
In one embodiment, issuing the following command:
In one embodiment, the lower layers override the higher layers. For example, assume that a first user (of many possible users) of the test database system issues a DIAGNOSTIC-SET STATISTICS command at the SYSTEM level causing statistics from target database system 110A to be copied from the QCD into the SYSTEM level segment of the cache 127. If no other DIAGNOSTIC-SET STATISTICS commands have been issued, the statistics stored in the SYSTEM level segment of the cache 127 apply to all users of the system.
Assume that a second user of the system issues a DIAGNOSTIC-SET STATISTICS command at the IFP level causing statistics from target database system 110b to be copied from the QCD into the IFP level segment of the cache 127. In that case, in one embodiment, the IFP level segment of the cache 127 would override the SYSTEM level segment of the cache 127 for the second user assuming the second user is logged on to that IFP. A third user could issue a DIAGNOSTIC-SET STATISTICS command at the IFP level and the resulting statistics would apply for the third user. A fourth user could issue a DIAGNOSTIC-SET STATISTICS command at the SESSION level and the resulting statistics would override the SYSTEM and SESSION level statistics for that user (i.e., the fourth user). A fifth user could issue a DIAGNOSTIC-SET STATISTICS command at the REQUEST level and the resulting statistics would override the SYSTEM, IFP, and SESSION levels for that user (i.e., the fifth user) for that request.
The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
Number | Name | Date | Kind |
---|---|---|---|
5181017 | Frey, Jr. et al. | Jan 1993 | A |
5303383 | Neches et al. | Apr 1994 | A |
5398199 | Lefons | Mar 1995 | A |
5488570 | Agarwal | Jan 1996 | A |
5495578 | Rohrbaugh et al. | Feb 1996 | A |
5508942 | Agarwal | Apr 1996 | A |
5511003 | Agarwal | Apr 1996 | A |
5515296 | Agarwal | May 1996 | A |
5606693 | Nilsen et al. | Feb 1997 | A |
5625815 | Maier et al. | Apr 1997 | A |
5640584 | Kandasamy et al. | Jun 1997 | A |
5655080 | Dias et al. | Aug 1997 | A |
5819066 | Bromberg et al. | Oct 1998 | A |
5857180 | Hallmark et al. | Jan 1999 | A |
5857192 | Fitting | Jan 1999 | A |
5864842 | Pederson et al. | Jan 1999 | A |
5870752 | Gibbons et al. | Feb 1999 | A |
5872904 | McMillen et al. | Feb 1999 | A |
5873079 | Davis, III et al. | Feb 1999 | A |
5884299 | Ramesh et al. | Mar 1999 | A |
5940819 | Beavin et al. | Aug 1999 | A |
5943666 | Kleewein et al. | Aug 1999 | A |
5950188 | Wildermuth | Sep 1999 | A |
5970495 | Baru et al. | Oct 1999 | A |
6067542 | Carino, Jr. | May 2000 | A |
6208955 | Provan et al. | Mar 2001 | B1 |
6223171 | Chaudhuri et al. | Apr 2001 | B1 |
6272487 | Beavin et al. | Aug 2001 | B1 |
6289334 | Reiner et al. | Sep 2001 | B1 |
6298349 | Toyoshima et al. | Oct 2001 | B1 |
6370522 | Agarwal et al. | Apr 2002 | B1 |
6381604 | Caughran et al. | Apr 2002 | B1 |
6401083 | Agarwal et al. | Jun 2002 | B1 |
6414307 | Gerlach et al. | Jul 2002 | B1 |
6418389 | Peter et al. | Jul 2002 | B2 |
6430556 | Goldberg et al. | Aug 2002 | B1 |
6434545 | MacLeod et al. | Aug 2002 | B1 |
6477523 | Chiang | Nov 2002 | B1 |
6487547 | Ellison et al. | Nov 2002 | B1 |
6507842 | Grey et al. | Jan 2003 | B1 |
6564209 | Dempski et al. | May 2003 | B1 |
6615120 | Rother | Sep 2003 | B1 |
6618719 | Andrei | Sep 2003 | B1 |
6662312 | Keller | Dec 2003 | B1 |
6738756 | Brown et al. | May 2004 | B1 |
6801903 | Brown et al. | Oct 2004 | B2 |
7010521 | Hinshaw et al. | Mar 2006 | B2 |
7155428 | Brown et al. | Dec 2006 | B1 |
7162472 | Galindo-Legaria et al. | Jan 2007 | B2 |
7185000 | Brown et al. | Feb 2007 | B1 |
7246111 | Chaware et al. | Jul 2007 | B1 |
7370043 | Shelton et al. | May 2008 | B1 |
20030016237 | Hickey | Jan 2003 | A1 |
20030149970 | Shanbhogue et al. | Aug 2003 | A1 |
20040167891 | Galindo-Legaria et al. | Aug 2004 | A1 |
20050210010 | Larson et al. | Sep 2005 | A1 |
Entry |
---|
Blasgen, Michael W., et al., “On the Evaluation of Queries in a Relational Data Base System”, IBM Research Report RJ 1745, (Apr. 8, 1976),1-44. |
Blasgen, Michael W., et al., “Storage and Access in Relational Data Bases”, IBM Syst. J. No. 4,(1977),363-377. |
Bourbakis, N.G “Emulating Human Visual Perception for Measuring Difference in Images using an SPN Graph Approach”, IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, Vo. 32, No. 2, (Apr. 1, 2002),191-201. |
Brown, Douglas P., et al., “Analysis Method and Apparatus for a Parallel System”, U.S. Appl. No. 09/608,977, filed Jun. 30, 2000. |
Brown, Douglas P., et al., “Collecting and/or Presenting Demographics in a Database System”, U.S. Appl. No. 09/976,632, filed Oct. 12, 2001. |
Brown, Douglas P., et al., “Collecting Statistics in a Database System”, U.S. Appl. No. 09/976,634, filed Oct. 12, 2001. |
Brown, Douglas P., et al., “Emulating a Database System”, U.S. Appl. No. 10/039,283, filed Dec. 31, 2001. |
Brown, Douglas P., et al., “Index Selection in a Database System”, U.S. Appl. No. 09/977,038, filed Oct. 12, 2001. |
Brown, Douglas P., et al., “Method and Apparatus for Presenting Query Plans”, U.S. Appl. No. 09/608,976, filed Jun. 30, 2000. |
Chamberlin, D. D., et al., “Sequel 2: A Unified Approach to Data Definition, Manipulation, and Control”, IBM J. Res. Develop., (Nov. 1, 1976). |
Chamberlin, D. D., et al., “Views, Authorization, and Locking in a Relational Data Base System”, National Computer Conference, (1975),425-430. |
Cheu, Dwight et al., “SQL Language Reference Manual”, Version 6.0, Oracle,(Feb. 1, 1990),5-1 to 5-5 and 5-96 to 5-97. |
Ghosh, Saumyendu et al., “Global Implementation of ERP Software-Critical Success Factors on Upgrading Technical Infrastructure”, IEMC '03, Managing Technology Driven Organization Side of Innovation and Change, (Nov. 2-4, 2003),320-324. |
Kitsuregawa, Masaru et al., “Query Execution for Large Relations on Functional Disk System”, Institute of Industrial Science, University of Tokyo, (1989),159-167. |
Klindt, Jerry et al., “Copying a Portion of a Database Structure Associated with a Query”, U.S. Appl. No. 09/892,837, (Jun. 27, 2001). |
NCR Corporation, “Teradata Performance Services”, Brochure, (2002). |
Quarterman, John S., et al., “Notable Computer Networks”, Communications of the ACM, vol. 29, No. 10, (1986),932-971. |
Selinger, P. G., et al., “Access Path Selection in a Relational Database Management System”, ACM, (1979),23-34. |
Wong, Eugene et al., “Decomposition—A Strategy for Query Processing”, ACM Transactions on Database Systems, vol. 1, No. 3, (Sep. 1, 1976),223-241. |