1. Technical Field
This disclosure generally relates to computer systems, and more specifically relates to database systems.
2. Background Art
Database systems have been developed that allow a computer to store a large amount of information in a way that allows a user to search for and retrieve specific information in the database. For example, an insurance company may have a database that includes all of its policy holders and their current account information, including payment history, premium amount, policy number, policy type, exclusions to coverage, etc. A database system allows the insurance company to retrieve the account information for a single policy holder among the thousands and perhaps millions of policy holders in its database. Retrieval of information from a database is typically done using queries. A database query typically includes one or more predicate expressions interconnected with logical operators.
Structured Query Language (SQL) provides a way to write queries to a database. SQL supports user-defined functions. A user-defined function (UDF) in SQL gives the programmer the capability to provide an encapsulated reusable cell in the SQL language. User defined functions can perform any service the programmer wants to implement. The services in a UDF may be very simple or may be quite complex.
A UDF in SQL can be deterministic or non-deterministic. A deterministic UDF is a UDF that is predictable, meaning a set of inputs to the UDF will always produce the same result. A non-deterministic UDF is one that does not always produce the same result given a set of inputs. The disclosure and claims herein deal with deterministic UDFs. Unless otherwise noted herein, the term “UDF” as used herein means a deterministic UDF.
Because a deterministic UDF always returns the same result given a particular set of inputs, a database optimizer may cache the results of executing one or more portions of a UDF. By caching the results, the results may be retrieved and reused later for the same query that has the same inputs. One way that has been used in the art to cache UDF results uses a hash table. In the known implementations of a deterministic UDF, there is one hash table assigned for use with the UDF, which is good for the life of the query. Let's assume for the sake of illustration a UDF invokes function XYZ three times on three individual columns. If the cardinality of the three columns is a disjoint set, meaning they have no values in common, using one hash table for this UDF will provide terrible performance. Without a way to process deterministic UDFs in a more optimized way, processing deterministic UDFs will continue to be inefficient in known database systems.
A deterministic UDF processing mechanism processes user-defined functions (UDFs) using multiple hash tables. Data access patterns for a UDF are collected, and an appropriate hash table set is then determined for the UDF from the data access patterns. If a UDF accesses some similar columns and some disjoint columns, the similar columns are grouped together, and one or more hash tables are allocated to the similar columns. Disjoint columns are allocated their own hash tables. In addition, the allocation of hash tables may be adjusted based on historical access patterns collected over time. By dynamically allocating and adjusting sets of hash tables to a deterministic UDF, the performance of the UDF is greatly increased.
The foregoing and other features and advantages will be apparent from the following more particular description, as illustrated in the accompanying drawings.
The disclosure will be described in conjunction with the appended drawings, where like designations denote like elements, and:
The claims and disclosure herein provide a way to allocate multiple hash tables to a user-defined function (UDF) to improve performance in processing the UDF. The number and allocation of the hash tables is according to historical access patterns that indicate the history of how the UDF has been processed in the past. The number and allocation of hash tables may also be dynamically adjusted over time according to historical access patterns, thereby allowing the UDF to evolve according to actual usage of the UDF. In this manner, the performance of processing a UDF is optimized.
Referring to
Main memory 120 preferably contains data 121, an operating system 122, a database 123, one or more queries 124, a deterministic user-defined function (UDF) 125, historical access patterns 126, and a deterministic UDF processing mechanism 127. Data 121 represents any data that serves as input to or output from any program in computer system 100. Operating system 122 is a multitasking operating system known in the industry as i5/OS; however, those skilled in the art will appreciate that the spirit and scope of this disclosure is not limited to any one operating system. Database 123 is any suitable database, whether currently known or developed in the future. Database 123 preferably includes one or more tables that have one or more columns. Queries 124 include any suitable database query, including an SQL query. Deterministic UDF 125 is a user-defined function that may be executed by itself, or may be referenced in one or more query 124. Historical access patterns 126 indicate how data was accessed in the past in processing the deterministic UDF 125. The historical access patterns 126 are collected by the access pattern collection mechanism 128. The deterministic UDF processing mechanism 127 improves the processing of UDF 125 by allocating multiple hash tables to a UDF according to the historical access patterns 126 collected by the access pattern collection mechanism 128. The hash table set determination mechanism 129 determines how many hash tables are needed for a given UDF, and how these hash tables should be allocated to different columns accessed by the UDF.
Computer system 100 utilizes well known virtual addressing mechanisms that allow the programs of computer system 100 to behave as if they only have access to a large, single storage entity instead of access to multiple, smaller storage entities such as main memory 120 and DASD device 155. Therefore, while data 121, operating system 122, database 123, queries 124, deterministic UDF 125, historical access patterns 126, and deterministic UDF processing mechanism 127 are shown to reside in main memory 120, those skilled in the art will recognize that these items are not necessarily all completely contained in main memory 120 at the same time. It should also be noted that the term “memory” is used herein generically to refer to the entire virtual memory of computer system 100, and may include the virtual memory of other computer systems coupled to computer system 100.
Processor 110 may be constructed from one or more microprocessors and/or integrated circuits. Processor 110 executes program instructions stored in main memory 120. Main memory 120 stores programs and data that processor 110 may access. When computer system 100 starts up, processor 110 initially executes the program instructions that make up operating system 122.
Although computer system 100 is shown to contain only a single processor and a single system bus, those skilled in the art will appreciate that a deterministic UDF processing mechanism may be practiced using a computer system that has multiple processors and/or multiple buses. In addition, the interfaces that are used preferably each include separate, fully programmed microprocessors that are used to off-load compute-intensive processing from processor 110. However, those skilled in the art will appreciate that these functions may be performed using I/O adapters as well.
Display interface 140 is used to directly connect one or more displays 165 to computer system 100. These displays 165, which may be non-intelligent (i.e., dumb) terminals or fully programmable workstations, are used to provide system administrators and users the ability to communicate with computer system 100. Note, however, that while display interface 140 is provided to support communication with one or more displays 165, computer system 100 does not necessarily require a display 165, because all needed interaction with users and other processes may occur via network interface 150.
Network interface 150 is used to connect computer system 100 to other computer systems or workstations 175 via network 170. Network interface 150 broadly represents any suitable way to interconnect electronic devices, regardless of whether the network 170 comprises present-day analog and/or digital techniques or via some networking mechanism of the future. In addition, many different network protocols can be used to implement a network. These protocols are specialized computer programs that allow computers to communicate across a network. TCP/IP (Transmission Control Protocol/Internet Protocol) is an example of a suitable network protocol.
At this point, it is important to note that while the description above is in the context of a fully functional computer system, those skilled in the art will appreciate that the deterministic UDF processing mechanism may be distributed as an article of manufacture in a variety of forms, and the claims extend to all suitable types of computer-readable media that bear instructions that may be executed by a computer. Examples of suitable computer-readable media include recordable media such as floppy disks and CD-RW (e.g., 195 of
The deterministic UDF processing mechanism may also be delivered as part of a service engagement with a client corporation, nonprofit organization, government entity, internal organizational structure, or the like. This may include configuring a computer system to perform some or all of the methods described herein, and deploying software, hardware, and web services that implement some or all of the methods described herein. This may also include analyzing the client's operations, creating recommendations responsive to the analysis, building systems that implement portions of the recommendations, integrating the systems into existing processes and infrastructure, metering use of the systems, allocating expenses to users of the systems, and billing for use of the systems.
Referring to
Referring to
Method 600 in
Referring to
Referring to
Another suitable improvement may be made by determining whether or not saving the allocation of hash tables for other queries would be beneficial. In the prior art, a hash table is allocated for a single query. In the disclosure and claims herein, the hash table could be used for other queries as well. If saving the allocation of hash tables for other queries is beneficial (step 830=YES), the allocation of hash tables is saved for other queries (step 840). If saving the allocation of hash tables for other queries is not beneficial (step 830=NO), method 800 is done.
Several user-defined criteria could be used to determine whether saving the allocation of hash tables for other queries is beneficial in step 830 in
The deterministic UDF processing mechanism disclosed and claimed herein specifies multiple hash tables for a single UDF to enhance the speed of processing the UDF. The multiple hash tables may be allocated to the UDF according to historical access patterns of the UDF. In addition, the allocation of hash tables to a UDF may be dynamically adjusted according to historical access patterns as time goes forward.
One skilled in the art will appreciate that many variations are possible within the scope of the claims. Thus, while the disclosure is particularly shown and described above, it will be understood by those skilled in the art that these and other changes in form and details may be made therein without departing from the spirit and scope of the claims.
Number | Name | Date | Kind |
---|---|---|---|
5511190 | Sharma et al. | Apr 1996 | A |
6009432 | Tarin | Dec 1999 | A |
6263331 | Liu et al. | Jul 2001 | B1 |
6301575 | Chadha et al. | Oct 2001 | B1 |
6338056 | Dessloch et al. | Jan 2002 | B1 |
6353820 | Edwards et al. | Mar 2002 | B1 |
6356888 | Egan et al. | Mar 2002 | B1 |
6598051 | Wiener et al. | Jul 2003 | B1 |
6701317 | Wiener et al. | Mar 2004 | B1 |
7058639 | Chatterjee et al. | Jun 2006 | B1 |
7103608 | Ozbutun et al. | Sep 2006 | B1 |
7120864 | Cai et al. | Oct 2006 | B2 |
7145906 | Fenner | Dec 2006 | B2 |
7152165 | Maheshwari et al. | Dec 2006 | B1 |
7171427 | Witkowski et al. | Jan 2007 | B2 |
7313565 | Zhang et al. | Dec 2007 | B2 |
7313575 | Carr et al. | Dec 2007 | B2 |
7499912 | Hershkovich et al. | Mar 2009 | B2 |
7606792 | Cunningham et al. | Oct 2009 | B2 |
7653670 | Hasan et al. | Jan 2010 | B2 |
7681240 | Maheshwari et al. | Mar 2010 | B2 |
20040006574 | Witkowski et al. | Jan 2004 | A1 |
20040034616 | Witkowski et al. | Feb 2004 | A1 |
20040172267 | Patel et al. | Sep 2004 | A1 |
20050091443 | Hershkovich et al. | Apr 2005 | A1 |
20050119837 | Prakash et al. | Jun 2005 | A1 |
20050166140 | Cai et al. | Jul 2005 | A1 |
20050187946 | Zhang et al. | Aug 2005 | A1 |
20050210023 | Barrera et al. | Sep 2005 | A1 |
20050278270 | Carr et al. | Dec 2005 | A1 |
20060041544 | Santosuosso | Feb 2006 | A1 |
20060123249 | Maheshwari et al. | Jun 2006 | A1 |
20060123250 | Maheshwari et al. | Jun 2006 | A1 |
20060248063 | Gordon | Nov 2006 | A1 |
20060259344 | Patel et al. | Nov 2006 | A1 |
20070036309 | Zoldi et al. | Feb 2007 | A1 |
20070136331 | Hasan et al. | Jun 2007 | A1 |
20080082628 | Rowstron et al. | Apr 2008 | A1 |
20080097972 | Gordon | Apr 2008 | A1 |
20080133446 | Dubnicki et al. | Jun 2008 | A1 |
20080195577 | Fan et al. | Aug 2008 | A1 |
20080216051 | Johnson et al. | Sep 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20090204566 A1 | Aug 2009 | US |