1. Field
The present disclosure relates to hash table data structures. More particularly, the disclosure concerns adaptive hash table resizing for hash tables that support concurrent access by readers and writers using the read-copy update synchronization mechanism.
2. Description of the Prior Art
By way of background, hash tables provide useful data structures for many applications, with various convenient properties such as constant average time for accesses and modifications. When a hash table is shared for reading and writing by concurrent applications, a suitable synchronization mechanism is required to maintain internal consistency. One technique for supporting concurrent hash table access comes in the form of Read-Copy Update (RCU). RCU is a synchronization mechanism with very low overhead for readers, and thus works particularly well for data structures with significantly more reads than writes, such as hash tables. These properties allow RCU-protected hash tables to scale well to many threads on many processors.
RCU-protected hash tables are implemented using open chaining, with RCU-protected linked lists being provided for the hash buckets. Readers traverse these linked lists without using locks, atomic operations or other forms of mutual exclusion. Writers performing updates to hash table elements protect the readers by waiting for a grace period to elapse before freeing any stale data that the readers may have been referencing.
A challenge respecting RCU-protected hash tables is the need to support efficient hash table resizing. The ability to dynamically resize a hash table stems from the fact that the performance and suitability of hash tables depend heavily on choosing the appropriate size for the table. Making a hash table too small will lead to excessively long hash chains and poor performance. Making a hash table too large will consume too much memory, reducing the memory available for other applications or performance-improving caches, and increasing hardware requirements. Many systems and applications cannot know the proper size of a hash table in advance. Software designed for use on a wide range of system configurations with varying needs may not have the option of choosing a single hash table size suitable for all supported system configurations. Furthermore, the needs of a system may change at run time due to numerous factors, and software must scale both up and down dynamically to meet these needs. For example, in a system that supports virtual computing environments, the ability to shrink a hash table can be particularly important so that memory can be reallocated from one virtual environment to another.
Resizing an RCU-protected hash table so as to either increase or decrease the hash table size results in hash buckets being respectively added to or removed from the hash table, with a corresponding change being made to the hash function. This usually entails one or more hash table elements having to be relocated to a different hash bucket, which can be disruptive to readers if care is not taken to protect their operations during the resizing operation. Existing RCU-protected hash tables support reader-friendly hash table resizing using several approaches. However, there are shortcomings that are variously associated with these approaches, such as (1) the need to maintain duplicate sets of per-element list links, thereby increasing the hash table memory foot print, (2) the need to incur large numbers of grace period delays and require readers to search two hash table versions during resizing, and (3) the need to copy data hash table elements, which makes it difficult or impossible for readers to maintain long-lived references to such elements. The present disclosure presents a new technique that enables optimized resizing of RCU-protected hash tables while permitting concurrent read access without any of the above deficiencies.
A method, system and computer program product are provided for resizing an RCU-protected hash table stored in a memory. According to the disclosed technique, a second RCU-protected hash table is allocated in the memory. The second hash table represents a resized version of said first hash table that has a different number of hash buckets than the first hash table, the second hash table buckets being defined but initially having no hash table elements. The second hash table is populated by linking each hash bucket of the second hash table to all hash buckets of the first hash table containing elements that hash to the second hash table bucket. The second hash table is then published so that it is available for searching by hash table readers. The first hash table is freed from memory after waiting for a grace period which guarantees that no readers searching the first hash table will be affected by the freeing.
In an embodiment, the second hash table has a size that is an integral factor of a size of the first hash table. In a further embodiment, the resizing comprises shrinking the hash table. In that case, (1) a hash function is selected for the second hash table so that elements of a given hash bucket of the first hash table map to a single hash bucket of the second hash table, and (2) the second hash table bucket links to a first hash bucket of the first hash table that in turn links to at least one additional hash bucket of the first hash table, such that the second hash table bucket chains through different buckets of first hash table whose elements map to the second hash table bucket.
In a further embodiment, the resizing comprises expanding the hash table. In that case, (1) a hash function is selected for the second hash table so that elements of a given hash bucket in the first hash table map to a predictable set of hash buckets of the second hash table, and (2) at least two hash buckets of the second hash table link to the same hash bucket of the first hash table due to the first hash table bucket containing elements that map to different hash buckets of the second hash table. In accordance with this embodiment, the disclosed technique may further include separating the hash bucket of the first hash table into the hash buckets of said second hash table. The separating may be performed by de-linking chains of elements in the hash bucket of the first hash table that respectively hash to different hash buckets of the second hash table. In particular, the separation may be performed by successively changing links from elements in the linked list representing the first hash table bucket to point to the next element of the linked list that hashes to the same bucket in the second hash table, or to point to a bucket-ending sentinel value (e.g., NULL). The separating includes waiting for a grace period before de-linking any two of the chains from each other, the grace period guaranteeing that no readers searching the second hash table will be affected by the de-linking.
The foregoing and other features and advantages will be apparent from the following more particular description of example embodiments, as illustrated in the accompanying Drawings, in which:
Introduction
Example embodiments will now be described for dynamically resizing RCU-protected hash tables in a manner that optimizes the resizing operation by conserving memory resources and minimizing both reader and writer overhead. The RCU-protected hash table resizing technique disclosed herein offers the following advantages:
In order to achieve these benefits, an approach is taken wherein any resizing-induced changes to the hash function are restricted so that a given hash bucket in the hash table prior to resizing will map to a predictable bucket or set of buckets in the hash table subsequent to resizing. This restriction allows a hash table to be resized using cross-linking operations in which the hash table elements are neither copied nor moved around in memory. Instead, resizing occurs in an incremental fashion so that readers see consistent hash bucket lists with all applicable hash table elements at all times. The approach waits for grace periods between certain steps of the resizing operation in order to guarantee that readers see a sufficiently consistent view of the hash table. Using the disclosed technique, shrinking a hash table requires only a single grace period. Enlarging a hash table requires only a limited number of grace periods that does not exceed the number of hash table elements in the longest hash chain.
Resizing an RCU-Protected Hash Table by Shrinking
To shrink an RCU-protected hash table, an updater may perform the example operations 2-16 shown in
As shown in block 2 of
As shown in block 8 of
At this point, if a reader were to access the new hash table H2, it would find all of the elements of the original hash table H1. It is therefore safe to set the size of the new hash table H2 and publish it as a valid hash table that replaces the original hash table H1 (e.g., using the rcu_assign_pointer( ) primitive). These operations are shown in blocks 12 and 14 of
Resizing an RCU-Protected Hash Table by Expanding
To expand an RCU-protected hash table, an updater may perform the example operations 20-44 shown in
As shown in block 22 of
As shown in block 28 of
Blocks 34-44 of
Before reiterating blocks 38-42 with respect to the next bucket in the new hash table H2 (per block 36), block 44 waits for a grace period (e.g., by calling a primitive such as synchronize_rcu( ) or synchronize_rcu_expedited( )). The grace period is needed because the next iteration will link element n2 to element n4, thereby removing the existing link from element n2 to element n3. Without the grace period, a reader that is referencing element n2 but searching for odd-numbered hash table elements would be unable to continue its search when element n2 is relinked from element n3 to element n4.
Readers
Advantageously, the foregoing resizing techniques allow readers to perform concurrent read operations during hash table resizing without incurring any significant overhead. To access an RCU-protected hash table for reading, the reader initiates an RCU read-side critical section, for example, using the rcu_read_lock( ) primitive. The only additional step required of the reader is to snapshot the original hash table pointer in case an updater replaces the pointer during the reader's lookup operation. This represents a simple fetch and store sequence to create a local copy of the pointer. Once the reader has done this, it may search the hash table in conventional fashion, as by (1) hashing the desired key, modulo the number of buckets, (2) searching for the corresponding hash bucket, (3) traversing the hash bucket's linked list, comparing each element's key to the desired key, and (4) carrying out the desired read operation on the hash table element whose key matches the desired key. Thereafter, the reader may exit the RCU read-side critical section, for example, using the rcu_read_unlock( ) primitive. In this way, readers search only one hash bucket, as required.
Example Computing Environment
Turning now to the
The computer system 102 may represent any of several different types of computing apparatus. Examples of such apparatus include, but are not limited to, general purpose computers, special purpose computers, portable computing devices, communication and/or media player devices, set-top devices, embedded systems, and other types of information handling machines. The term “processor” as used with reference to the processors 1041, 1042 . . . 104n encompasses any logical execution unit capable of executing program instructions, including but not limited to a packaged integrated circuit device (such as a microprocessor), a processing core within a packaged integrated circuit device (such as a microprocessor core), or a hardware thread comprising one or more functional units within a processing core (such as an SMT thread). The processors 1041, 1042 . . . 104n may be situated within a single computing device or node (e.g., as part of a single-node SMP system) or they may be distributed over plural nodes (e.g., as part of a NUMA system, a cluster, or a cloud). The memory 8 may comprise any type of tangible storage medium capable of storing data in computer readable form for use in program execution, including but not limited to, any of various types of random access memory (RAM), various flavors of programmable read-only memory (PROM) (such as flash memory), and other types of primary storage (i.e., program memory). The cache memories 1101, 1102 . . . 110n may be implemented in several levels (e.g., as level 1, level 2 and level 3 caches) and the cache controllers 1121, 1122 . . . 112n may collectively represent the cache controller logic that supports each cache level. As illustrated, the memory controller 114 may reside separately from processors 1041, 1042 . . . 104n, for example, as part of a discrete chipset. Alternatively, the memory controller 114 could be provided by plural memory controller instances that are respectively integrated with the processors 1041, 1042 . . . 104n.
Each of the processors 1041, 1042 . . . 104n is operable to execute program instruction logic under the control of a software program stored in the memory 108 (or elsewhere). As part of this program execution logic, update operations (updaters) 118 will periodically execute within a process, thread, or other execution context (hereinafter “task”) on the processors 1041, 1042 . . . 104n to perform hash table resizing on the hash table 116. Reference numerals 1181, 1182 . . . 118n illustrate individual updaters that may execute from time to time on the various processors 1041, 1042 . . . 104n. Each of the processors 1041, 1042 . . . 104n also periodically executes read operations (readers) 120 on the hash table 116. Reference numerals 1201, 1202 . . . 120n illustrate individual readers that may execute from time to time on the various processors 1041, 1042 . . . 104n. Each search operation is assumed to entail an element-by-element traversal of a bucket (implemented as a linked list) until one or more items representing the target of the search are found. In order to support concurrent hash table operations, such search operations may be performed using a lock-free synchronization mechanism, such as read-copy update. Each search operation is assumed to entail an element-by-element traversal of a bucket (implemented as an RCU-protected linked list) until one or more items representing the target of the search are found.
To facilitate synchronized updater-reader access to the hash table 116, the several processors 1041, 1042 . . . 104n are programmed to implement an RCU subsystem 122 by periodically executing respective RCU instances 1221, 1222 . . . 122n as part of their operating system functions or user-mode operations. As shown in
Accordingly, a technique for optimized resizing of RCU-protected hash tables has been disclosed. It will be appreciated that the foregoing concepts may be variously embodied in any of a data processing system, a machine implemented method, and a computer program product in which programming logic is provided by one or more machine-usable storage media for use in controlling a data processing system to perform the required functions. Example embodiments of a data processing system and machine implemented method were previously described in connection with
Example data storage media for storing such program instructions are shown by reference numerals 108 (memory) and 110 (cache) of the computer system 102 of
Although various example embodiments have been shown and described, it should be apparent that many variations and alternative embodiments could be implemented in accordance with the disclosure. It is understood, therefore, that the invention is not to be in any way limited except in accordance with the spirit of the appended claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
5442758 | Slingwine et al. | Aug 1995 | A |
5608893 | Slingwine et al. | Mar 1997 | A |
5727209 | Slingwine et al. | Mar 1998 | A |
5960434 | Schimmel | Sep 1999 | A |
6219690 | Slingwine et al. | Apr 2001 | B1 |
6662184 | Friedberg | Dec 2003 | B1 |
6886162 | McKenney | Apr 2005 | B1 |
6996812 | McKenney | Feb 2006 | B2 |
7085911 | Sachedina et al. | Aug 2006 | B2 |
7191272 | McKenney | Mar 2007 | B2 |
7287131 | Martin et al. | Oct 2007 | B1 |
7287135 | McKenney et al. | Oct 2007 | B2 |
7313555 | Klier | Dec 2007 | B2 |
7349926 | McKenney et al. | Mar 2008 | B2 |
7353346 | McKenney et al. | Apr 2008 | B2 |
7395263 | McKenney | Jul 2008 | B2 |
7395383 | McKenney | Jul 2008 | B2 |
7426511 | McKenney | Sep 2008 | B2 |
7454581 | McKenney et al. | Nov 2008 | B2 |
7472228 | McKenney et al. | Dec 2008 | B2 |
7533377 | Appavoo et al. | May 2009 | B2 |
7653791 | McKenney | Jan 2010 | B2 |
7668851 | Triplett | Feb 2010 | B2 |
7689789 | McKenney et al. | Mar 2010 | B2 |
7734879 | McKenney et al. | Jun 2010 | B2 |
7734881 | McKenney et al. | Jun 2010 | B2 |
7747805 | McKenney | Jun 2010 | B2 |
7809916 | Shavit et al. | Oct 2010 | B1 |
7814082 | McKenney | Oct 2010 | B2 |
7818306 | McKenney et al. | Oct 2010 | B2 |
7873612 | McKenney et al. | Jan 2011 | B2 |
7904436 | McKenney | Mar 2011 | B2 |
7934062 | McKenney et al. | Apr 2011 | B2 |
7953708 | McKenney et al. | May 2011 | B2 |
7953778 | McKenney et al. | May 2011 | B2 |
7987166 | McKenney et al. | Jul 2011 | B2 |
8020160 | McKenney | Sep 2011 | B2 |
8055860 | McKenney et al. | Nov 2011 | B2 |
8055918 | McKenney et al. | Nov 2011 | B2 |
20040083347 | Parson | Apr 2004 | A1 |
20060112121 | McKenney et al. | May 2006 | A1 |
20060117072 | McKenney | Jun 2006 | A1 |
20060130061 | McKenney | Jun 2006 | A1 |
20060265373 | McKenney et al. | Nov 2006 | A1 |
20070061372 | Appavoo et al. | Mar 2007 | A1 |
20080021908 | Trask et al. | Jan 2008 | A1 |
20080082532 | McKenney | Apr 2008 | A1 |
20080228691 | Shavit et al. | Sep 2008 | A1 |
20080313238 | McKenney et al. | Dec 2008 | A1 |
20090006403 | McKenney | Jan 2009 | A1 |
20090077080 | McKenney | Mar 2009 | A1 |
20100115235 | Triplett | May 2010 | A1 |
20110010396 | Zhou | Jan 2011 | A1 |
20110055183 | McKenney | Mar 2011 | A1 |
20110283082 | McKenney et al. | Nov 2011 | A1 |
20130151489 | McKenney et al. | Jun 2013 | A1 |
20130151811 | McKenney et al. | Jun 2013 | A1 |
Entry |
---|
J. Seigh, “RCU + SMR for preemptive kernel/user threads,” Linux Kernel Mailing List, May 9, 2005, 2 pages. |
M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects,” IEEE Transactions On Parallel And Distributed Systems, Jun. 2004, vol. 15, No. 6, pp. 491-504. |
D. Sarma et al., “Making RCU Safe for Deep Sub-Millisecond Response Realtime Applications,” 2004 USENIX (UseLinux track) Jun. 2004, 9 pages. |
P. McKenney, “RCU vs. Locking Performance on Different CPUs,” 2004 Linux.conf.au, 2004, 18 pages. |
P. McKenney et al., “Scaling dcache with RCU,” Linux Journal, Jan. 1, 2004, 12 pages. |
P. McKenney et al., “Using RCU in the Linux 2.5 Kernel,” Linux Journal, Oct. 1, 2003, 11 pages. |
P. McKenney et al.,“Read-Copy Update,” 2002 Ottawa Linux Symposium, Jul. 8, 2002, 28 pages. |
H. Lindar et al., “Scalability of the Directory Entry Cache,” 2002 Ottawa Linux Symposium, Jun. 26, 2002, pp. 289-300. |
P. McKenney et al., “Read-Copy Update,” 2001 Ottawa Linux symposium, Jul. 2001, 22 pages. |
P. McKenney et al., “Read-Copy Update: Using Execution History to Solve Concurrency Problems,” PDCS, Oct. 1998, 11 pages. |
S. Dietrich et al., “Evolution of Real-Time Linux,” 7th RTL Workshop, Nov. 17, 2005, 18 pages. |
B. Gamsa, “Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System,” 1999, 14 pages. |
Molnar et al., “Realtime and Linux,” 2005 Linux Kernel Summit, 8 pages. |
H. Boehm, “The Space Cost of Lazy Reference Counting,” ACM SIGPLAN Notices, Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '04, vol. 39, Issue 1, Jan. 2004, p. 210-219. |
M. Michael, “Scalable Lock-Free Dynamic Memory Allocation,” ACM SIGPLAN Notices, Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design And Implementation; PLDI '04, vol. 39, Issue 6, Jun. 2004, p. 35-46. |
D. Dice et al., “Mostly Lock-Free Malloc,” ACM SIGPLAN Notices, Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, vol. 38, Issue 2 Supplement, Jun. 2002, p. 163-174. |
J. Corbet, “Read-copy-update for realtime,” LWN.net, Sep. 26, 2006, 3 pages. |
McKenney, “Seven real-time Linux approaches (Part C)”, LinuxDevices.com, Jun. 7, 2005, 13 pages. |
P. McKenney, “RCU and CONFIG—PREEMPT—RT progress,” Linux Kernel Mailing List, May 9, 2005, 2 pages. |
O. Nesterov, QRCU: ‘Quick’ SRCU Implementation, Linux Kernel Mailing List, Dec. 1, 2005, 3 pages. |
P. McKenney, “Sleepable RCU”, LWN.net, Oct. 9, 2006, 10 pages. |
P. McKenney, “Read-Copy Update Implementations”, 2001, 3 pages. |
M. Herlihy, “A Methodology For Implementing Highly Concurrent Data Objects,” ACM Transactions on Programming Languages and Systems, vol. 15, Issue 5, Nov. 1993, pp. 745-770. |
M. Michael, “Safe Memory Reclamation for Dynamic Lock-Free Objects Using Atomic Reads and Writes,” Proceedings of the 21st Annual ACM Symposium on Principles of Distributed Computing, Jul. 2002, 10 pages. |
N. Barghouti et al., “Concurrency Control in Advanced Database Operations,” Jan. 1994, 83 pages. |
P. McKenney, “Exploiting Deferred Destruction: An Analysis of Read-Copy-Update Techniques in Operating System Kernels,” OGI School of School of Science & Engineering at Oregon Health & Science University, Jul. 2004, pp. 1-380. |
P. McKenney et al., “Extending RCU for Realtime and Embedded Workloads,” 2006 Ottawa Linux Symposium, Aug. 11, 2006, 15 pages. |
P. McKenney, “The design of preemptible read-copy-update,” LWN.net, Oct. 8, 2007, 27 pages. |
P. McKenney, “Integrating and Validating dynticks and Preemptible RCU,” LWN.net, Apr. 22, 2008, 19 pages. |
P. McKenney, “Hierarchical RCU,” LWN.net, Nov. 4, 2008, 19 pages. |
P. McKenney, “Is Parallel Programming Hard, And, If So, What Can You Do About It”, Mar. 8, 2009, 146 pages. |
P. McKenney, “Priority-Boosting RCU Read-Side Critical Sections,” LWN.net, Feb. 5, 2007, 15 pages. |
P. McKenney et al., “Towards hard realtime response from the Linux kernel on SMP hardware,” linux.conf.au, Canberra, Australia, Apr. 2005, 16 pages. |
P. McKenney et al., “Exploiting Deferred Destruction: An Analysis of Read-Copy-Update Techniques in Operating System Kernels”, Jan. 3, 2005, pp. 1-41. |
D. Guniguntala et al., “The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux”, IBM Systems Journal vol. 47 Nov. 2, 2008, pp. 221-236. |
P. McKenney, “Introducing Technology Into Linux”, 2008 Linux Developer Symposium, China, 2008, 47 pages. |
P. McKenney, “Simplicity Through Optimization”, linux.conf.au, Jan. 2010, 109 pages. |
P. McKenney, “Deterministic Synchronization in Multicore Systems: The Role of RCU”, Aug. 18, 2009, pp. 1-9. |
P. McKenney, “RCU cleanups and simplified preemptable RCU”, LKML.org, Jul. 23, 2009, 1 page. |
P. McKenney, “Expedited ”big hammer“ RCU grace periods”, LKML.org, Jun. 25, 2009, 2 pages. |
P. McKenney, “RCU: The Bloatwatch Edition”, LWN.net, Mar. 17, 2009, 9 pages. |
M. Desnoyers, “Low-Impact Operating System Tracing”, University of Montreal, PhD Thesis, Dec. 2009, 233 pages. |
P. McKenney, “Using a Malicious User-Level RCU to Torture RCU-Based Algorithms”, linux.conf.au, Jan. 2009, 51 pages. |
P. McKenney et al., “Introducing Technology Into the Linux Kernel: A Case Study”, Operating Systems Review, Jul. 2008, 16 page. |
P. McKenney, “What is RCU, Fundamentally”, LWN.net, Dec. 17, 2007, 15 pages. |
P. McKenney, What is RCU? Part 2: Usage, LWN.net,Dec. 24, 2007, 15 pages. |
P. McKenney, RCU part 3: the RCU API, LWN.net, Jan. 7, 2008, 7 pages. |
T. Hart et al., “Performance of memory reclamation for lockless synchronization”, Journal of Parallel and Distributed Computing, Dec. 2007, pp. 1270-1285. |
McKenney, “Using Promela and Spin to verify parallel algorithms”, LWN.net, Aug. 1, 2007, 11 pages. |
McKenney, “RCU and Unloadable Modules”, LWN.net, Jan. 14, 2007, 4 pages. |
P. Zijlstra, “[Patch] slab: document SLAB—DESTROY—BY—RCU”, LKML.org, Nov. 13, 2008, 1 page. |
A. Arcangeli et al., “Using Read-Copy-Update Techniques for System V IPC in the Linux 2.5 Kernel,” 2003 FREENIX, Jun. 14, 2003, 13 pages. |
P. McKenney, “The RCU API, 2010 Edition”, LWN.net, Dec. 8, 2010, 11 pages. |
O. Shalev et al., “Split-Ordered Lists: Lock-Free Extensible Hash Tables”, Journal of the ACM, vol. 53, No. 3, 2006, pp. 379-405. |
Anonymous, “A Method for Implementing a Lock-Free Hash Table”, ip.com/priorartdatabase.com/IPCOM000205877D, Apr. 6, 2011, pages. |
M. Jensen et al. “Optimality in external memory hashing”, Algorithmica, vol. 52, No. 3, 403-311, Nov. 2008, 1 page abstract. |
J. Appavoo, “Clustered Objects”, University of Toronto Ph.D Thesis, 2005, 176 pages. |
Number | Date | Country | |
---|---|---|---|
20130151488 A1 | Jun 2013 | US |