1. Field of the Invention
The present invention relates to the process of accessing a table within a database system. More specifically, the present invention relates to a method and an apparatus for using a hash-partitioned index to access a table that is not hash-partitioned.
2. Related Art
Many database applications use automatically generated keys as column values in relation tables. In such applications, the automatically generated key typically has a monotonically increasing value. For example, the transaction identifier in an Online Transaction Processing (OLTP) environment is usually an automatically generated key that is incremented to generate successive key values. Similarly, surrogate keys that are used in star schemas of data warehousing environments also tend to be automatically generated with monotonically increasing values.
Often, these automatically generated keys with monotonically increasing values have indexes defined on them. As a result, index accesses and maintenance activities tend to occur in a highly localized area of the index (e.g., the right-most edge of the index). These localized areas are known as “hotspots.” A hotspot can cause severe resource contention during periods of increased database activity, which can result in a performance degradation of the database system, such as, an increased response time for database transactions and reduced throughput.
Two methods are presently used to reduce hotspots. The first method reverses the bytes in the key and then uses the reversed key to perform operations in the index. This method largely eliminates the formation of hotspots because it disperses the reversed keys across the whole index. One of the drawbacks of this approach is that since it disperses the reversed keys in a highly random fashion, it causes continuous random movement of the disk head, which causes the disk seek-time to increase considerably because locality-of-reference is lost. Thus, although this method eliminates hotspots, it may, in fact, degrade the overall performance of the system. The second method is to use hash-partitioned tables, in which rows are mapped into partitions based on applying the hash function to the partitioning key. Since each partition of a hash-partitioned table has its own index, the index accesses and maintenance activities are equally distributed among all the partitions, thereby eliminating the formation of a single large hotspot. The drawback of this method is that it forces the user to partition tables using the hash-partitioning technique, which may not be the optimal partitioning technique for certain database applications. For example, in OLTP applications, it is quite common to use range-partitioned tables that are partitioned using the date fields for simplifying database management and for improving database performance. Thus, for such applications, we cannot use hash-partitioning to eliminate hotspots.
Hence, what is needed is a method and an apparatus for accessing a table that does not have the above-described drawbacks of the existing techniques.
One embodiment of the present invention provides a system that uses an index that is hash-partitioned to access a table that is not hash-partitioned. During system operation, the database receives a request to perform an operation involving a table in the database. If performing the operation involves looking up a key in the hash-partitioned index, the database applies a hash function to the key to identify a unique partition within the hash-partitioned index for the key. Next, the database uses the key to perform a lookup in the identified index partition to identify zero or more rows of the table that match the key.
In a variation on this embodiment, the table is not partitioned.
In a variation on this embodiment, the table is range-partitioned.
In a variation on this embodiment, the table is list-partitioned.
In a variation on this embodiment, the operation can include: querying the table to identify rows that match a logical condition; updating an existing row in the table; deleting an existing row in the table; inserting a new row in the table; creating a hash-partitioned index for the table; adding a partition to the hash-partitioned index; and coalescing a partition in the hash-partitioned index.
In a variation on this embodiment, the hash function is applied to a prefix of the key, instead of the entire key.
In a variation on this embodiment, identifying a unique partition for the key involves calculating a partition number.
In a variation on this embodiment, if performing the operation involves creating a hash-partitioned index, the database first obtains a key for each row in the table. Then, it applies a hash function to the key to identify a unique partition within the hash-partitioned index for the key. Finally, it inserts the key into the identified partition.
In a variation on this embodiment, if performing the operation involves adding a partition to the hash-partitioned index, and if the hash function has the prefix property, the database identifies a source partition in the hash-partitioned index to be subdivided to create two new partitions to replace the source partition. If the source partition is marked usable, the system then subdivides the source partition by first scanning through all the keys in the source partition, applying a new hash function to each key in the source partition to identify one of the two new partitions, and inserting the key into the identified new partition. Finally, it replaces the source partition with the two new partitions, thereby creating an additional partition.
In a variation on this embodiment, if performing the operation involves adding a partition to the hash-partitioned index, and if the hash function has the prefix property, the database identifies a source partition in the hash-partitioned index to be subdivided to create two new partitions to replace the source partition. If the source partition is marked unusable, the system then attempts to identify a second index, wherein the index key of the second index is a superset of the index key of the hash-partitioned index. If a second index is successfully identified, the system then subdivides the source partition by first scanning through one or more keys in the second index and applying the hash function to each key to determine whether the key maps to the source partition. Next, if the key maps to the source partition, the system applies a new hash function to the key to identify one of the two new partitions, and inserts the key into the identified new partition. Finally, it replaces the source partition with the two new partitions, thereby creating an additional partition.
In a variation on this embodiment, if performing the operation involves adding a partition to the hash-partitioned index, and if the hash function has the prefix property, the database identifies a source partition in the hash-partitioned index to be subdivided to create two new partitions to replace the source partition. If the source partition is marked unusable, the system then subdivides the source partition by first scanning through all the rows in the table, obtaining a key for each row, and applying the hash function to the key to determine whether the key maps to the source partition. Next, if the key maps to the source partition, the system applies a new hash function to the key to identify one of the two new partitions, and inserts the key into the identified new partition. Finally, it replaces the source partition with the two new partitions, thereby creating an additional partition.
In a variation on this embodiment, if performing the operation involves adding a partition to the hash-partitioned index, and if the hash function does not have the prefix property, the database creates a set of new partitions to replace the set of existing partitions, wherein the number of new partitions is one greater than the number of existing partitions. It then scans through all the keys in the existing partitions, applies a new hash function to each key to identify a unique partition within the set of new partitions for the key, and inserts the key into the identified partition. Finally, it replaces the existing set of partitions with the new set of partitions, thereby increasing the number of partitions by one.
In a variation on this embodiment, if performing the operation involves adding a partition to the hash-partitioned index, and if the hash function does not have the prefix property, the database creates a set of new partitions to replace the set of existing partitions, wherein the number of new partitions is one greater than the number of existing partitions. It then scans through all the rows in the table, obtains a key for each row, applies a new hash function to the key to identify a unique partition within the set of new partitions for the key, and inserts the key into the identified partition. Finally, it replaces the existing set of partitions with the new set of partitions, thereby increasing the number of partitions by one.
In a variation on this embodiment, if performing the operation involves coalescing a partition in the hash-partitioned index, and if the hash function has the prefix property, the database identifies two source partitions in the hash-partitioned index that share the same prefix to be coalesced to create a single new partition. It then coalesces the two source partitions to create the new partition by scanning through all the keys in the two source partitions, and inserting the keys into the single new partition. Finally, it replaces the two source partitions with the new partition, thereby reducing the number of partitions by one.
In a variation on this embodiment, if performing the operation involves coalescing a partition in the hash-partitioned index, and if the hash function has the prefix property, the database identifies two source partitions in the hash-partitioned index that share the same prefix to be coalesced to create a single new partition. If at least one of the two source partitions is marked unusable, the system then attempts to identify a second index, wherein the index key of the second index is a superset of the index key of the hash-partitioned index. If a second index is successfully identified, the system then coalesces the two source partitions by first scanning through one or more keys in the second index and applying the hash function to each key to determine whether the key maps to one of the source partitions. Next, if the key maps to one of the source partitions, the system inserts the key into the new partition. Finally, it replaces the two source partitions with the new partition, thereby reducing the number of partitions by one.
In a variation on this embodiment, if performing the operation involves coalescing a partition in the hash-partitioned index, and if the hash function has the prefix property, the database identifies two source partitions in the hash-partitioned index that share the same prefix to be coalesced to create a single new partition. If at least one of the two source partitions is marked unusable, the system then coalesces the two source partitions by first scanning through all the rows in the table, obtaining a key for each row, and applying the hash function to the key to determine whether the key maps to one of the source partitions. Next, if the key maps to one of the source partitions, the system inserts the key into the new partition. Finally, it replaces the two source partitions with the new partition, thereby reducing the number of partitions by one.
In a variation on this embodiment, if performing the operation involves coalescing a partition in the hash-partitioned index, and if the hash function does not have the prefix property, the database creates a set of new partitions to replace the set of existing partitions, wherein the number of new partitions is one less than the number of existing partitions. It then scans through all the keys in the existing partitions, applies a new hash function to each key to identify a unique partition within the set of new partitions for the key, and inserts the key into the identified partition. Finally, it replaces the existing set of partitions with the new set of partitions, thereby reducing the number of partitions by one.
In a variation on this embodiment, if performing the operation involves coalescing a partition in the hash-partitioned index, and if the hash function does not have the prefix property, the database creates a set of new partitions to replace the set of existing partitions, wherein the number of new partitions is one less than the number of existing partitions. It then scans through all the rows in the table, obtains a key for each row, applies a new hash function to the key to identify a unique partition within the set of new partitions for the key, and inserts the key into the identified partition. Finally, it replaces the existing set of partitions with the new set of partitions, thereby reducing the number of partitions by one.
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The data structures and code described in this detailed description are typically stored on a computer readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs) and DVDs (digital versatile discs or digital video).
Computer System
Database 104 can include any type of system for storing data in non-volatile storage. This includes, but is not limited to, database systems based upon magnetic, optical, and magneto-optical storage devices, as well as storage devices based on flash memory and/or battery-backed up memory. Database 104 includes a non-partitioned table 106 comprised of a collection of rows 108. Table 106 can be referenced through one or more hash-partitioned indexes, such as index 110. Note that indexes, including hash-partitioned indexes, provide a quick way to find rows with specific column values. In the absence of indexes, database 104 would have to scan through the whole table 106 in order to identify the rows 108 that match specific column values, which would be very inefficient. A hash-partitioned index contains two or more index partitions (112, 114, 116, and 118), wherein each index partition 118 contains index records 120 that identify the location of a row 108 in the table 106. Index records 120 are typically stored in a tree data structure, such as, B+-tree, that uses keys to facilitate efficient lookup and insert operations.
Accessing a Table
Note that, although the index 110 is hash-partitioned, table 106 may not be partitioned or may be partitioned using a different technique. For example, table 106 may be range-partitioned, as is often the case in OLTP applications. Furthermore, note that, since the index 110 is hash-partitioned, index accesses and maintenance activities are equally distributed among all the partitions (112, 114, 116, and 118), thereby eliminating the formation of a single large hotspot.
Creating a Hash-Partitioned Index
Adding a Partition
Coalescing a Partition
The foregoing descriptions of embodiments of the present invention have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the present invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the present invention is defined by the appended claims.
This application hereby claims priority under 35 U.S.C. §119 to U.S. Provisional Patent Application No. 60/569,122 filed on 6 May 2004, entitled “Eliminating Resource Contention During Maintenance of an Index Defined on Monotonically Increasing Keys,” by inventor Vikram Shukla.
Number | Name | Date | Kind |
---|---|---|---|
4611272 | Lomet | Sep 1986 | A |
5404510 | Smith et al. | Apr 1995 | A |
5511190 | Sharma et al. | Apr 1996 | A |
5515531 | Fujiwara et al. | May 1996 | A |
5551027 | Choy et al. | Aug 1996 | A |
5553218 | Li et al. | Sep 1996 | A |
5625815 | Maier et al. | Apr 1997 | A |
5878409 | Baru et al. | Mar 1999 | A |
5960194 | Choy et al. | Sep 1999 | A |
5960431 | Choy | Sep 1999 | A |
6092061 | Choy | Jul 2000 | A |
6175835 | Shadmon | Jan 2001 | B1 |
6216125 | Johnson | Apr 2001 | B1 |
6223171 | Chaudhuri et al. | Apr 2001 | B1 |
6226629 | Cossock | May 2001 | B1 |
6353820 | Edwards et al. | Mar 2002 | B1 |
6366903 | Agrawal et al. | Apr 2002 | B1 |
6389410 | Gupta | May 2002 | B1 |
6438562 | Gupta et al. | Aug 2002 | B1 |
6470333 | Baclawski | Oct 2002 | B1 |
6473774 | Cellis et al. | Oct 2002 | B1 |
6505189 | On Au et al. | Jan 2003 | B1 |
6516320 | Odom et al. | Feb 2003 | B1 |
6557014 | Cellis et al. | Apr 2003 | B1 |
6578039 | Kawamura | Jun 2003 | B1 |
6609131 | Zait et al. | Aug 2003 | B1 |
6618729 | Bhashyam et al. | Sep 2003 | B1 |
6622138 | Bellamkonda et al. | Sep 2003 | B1 |
6665684 | Zait et al. | Dec 2003 | B2 |
6772163 | Sinclair et al. | Aug 2004 | B1 |
6823377 | Wu et al. | Nov 2004 | B1 |
6845375 | Sinclair | Jan 2005 | B1 |
6920460 | Srinivasan et al. | Jul 2005 | B1 |
6944633 | Higa et al. | Sep 2005 | B1 |
7047250 | Agarwal et al. | May 2006 | B1 |
7054852 | Cohen | May 2006 | B1 |
7080072 | Sinclair | Jul 2006 | B1 |
7113957 | Cohen et al. | Sep 2006 | B1 |
7136861 | Sinclair et al. | Nov 2006 | B1 |
7158996 | Croisettier et al. | Jan 2007 | B2 |
7299239 | Basu et al. | Nov 2007 | B1 |
7454516 | Weinert et al. | Nov 2008 | B1 |
7472107 | Agrawal et al. | Dec 2008 | B2 |
20020194157 | Zait et al. | Dec 2002 | A1 |
20030004938 | Lawder | Jan 2003 | A1 |
20030055822 | Yu | Mar 2003 | A1 |
20030058277 | Bowman-Amuah | Mar 2003 | A1 |
20030074348 | Sinclair et al. | Apr 2003 | A1 |
20040148273 | Allen et al. | Jul 2004 | A1 |
20040148293 | Croisettier et al. | Jul 2004 | A1 |
20040199533 | Celis et al. | Oct 2004 | A1 |
20040260684 | Agrawal et al. | Dec 2004 | A1 |
20050050050 | Kawamura | Mar 2005 | A1 |
Entry |
---|
Gennario, Claudio, et al., “Similarity Search in Metric Databases Through Hashing”, Proceedings of the 2001 ACM Workshops on Multimedia: Multimedia Information Retrieval, Ottawa, Canada, Sep. 2001, pp. 1-5. |
Reynolds, Patrick, et al., “Efficient Peer-to-Peer Keyword Searching”, Middleware 2003, LNCS 2003, © 2003, pp. 21-40. |
Zeller, Bernhard, et al., “Exploiting Advanced Database Optimization Features for Large-Scale SAP R/3 Installations”, Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, © 2002, pp. 894-905. |
Chen, Ming-Syan, et al., “Optimal Design of Multiple Hash Tables for Concurrency Control”, IEEE Transactions on Knowledge and Data Engineering, vol. 9, No. 3, May/Jun. 1997, pp. 384-390. |
Shahzad, Muhammad Ahmad, “Data Warehousing With Oracle”, Internet Archive (Wayback Machine), dtd: May 2, 2003, pp. 1-18 (downloaded from: web.archive.org/web/20030502171314/http://www.oracular.com/white—paper—pdfs/DataWarehousingwithOracle.pdf). |
Chervenak, Ann, et al., “Giggle: A Framework for Constructing Scalable Replica Location Services”, Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Baltimore, MD, © 2002, pp. 1-17. |
Leverenz, Lefty, et al., “Chapter 8: Partitioned Tables and Indexes, Oracle8 Server Concepts”, Release 8.0, Part No. A54643-01, Oracle Corp., Jun. 1997, pp. 8-1 to 8-40. |
Banerjee, Sandeepan, et al., “All Your Data: The Oracle Extensibility Architecture”, Component Database Systems, Morgan Kaufmann Publisher, © 2001, pp. 1-32. |
Ghandeharizadeh, Shahram, et al., “Magic: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines”, IEEE Transactions on Parallel and Distributed Systems, vol. 5, No. 5, May 1994, pp. 509-524. |
Valduriez, Patrick, “Parallel Database Systems: Open Problems and New Issues”, Distributed and Parallel Databases, vol. 1, Kluwer Academic Publishers, Boston, MA, © 1993, pp. 137-165. |
Zeller, Bernhard, et al., “Experience Report: Exploiting Advanced Database Optimization Features for Large-Scale SAP R/3 Installations”, Proc. of the 28th VLDB Conf., Hong Kong, China, © 2002, pp. 894-905. |
Shahzad, Muhammad A., “Data Warehousing with Oracle”, Proc. Of SPIE, vol. 3695, Orlando, FL, Apr. 5, 1999, pp. 179-190. |
Cooper, Brian F., et al., “A Parallel Index for Semistructured Data”, SAC 2002, Madrid, Spain, © 2002, pp. 890-896. |
Lehman, Tobin J., “A Study of Index Structures for Main Memory Database Management Systems”, Proc. of the 12th International Conf. on VLDB, Kyoto, Japan, Aug. 1986, pp. 294-303. |
Lomet, David B., “Bounded Index Exponential Hashing”, TODS, vol. 8, Issue 1, Mar. 1983, pp. 136-165. |
Farazdel, Abbas, et al., “Oracle8i Parallel Server on IBM SP Systems: Implementation Guide”, IBM Redbooks, SG24-5591-00, Dec. 1999, pp. cover, i-ii, 56-60, 119-120 and 122. |
Microsoft Computer Dictionary, 5th Edition, Microsoft Press, Redmond, WA, © 2002, pp. 269-270. |
Baylis, Ruth, et al., Oracle Database Administrator's Guide 10g Release 1 (1o.1), Part No. B10739-01, Oracle Corp., Redwood City, CA, Dec. 2003, 147 pages. |
Dan Greene et al,. Multi-Index Hashing for Information Retrieval, 1994, IEEE, 722-731. |
Number | Date | Country | |
---|---|---|---|
20050251524 A1 | Nov 2005 | US |
Number | Date | Country | |
---|---|---|---|
60569122 | May 2004 | US |