1. Field of the Invention
The present invention generally relates to memory architectures in computer systems and, more particularly, to a system and method for supporting transactional memory.
2. Description of the Prior Art
Atomic transactions have been widely used in parallel computing and transaction processing. An atomic transaction generally refers to the execution of multiple operations, such that the multiple operations appear to be executed together without any intervening operations. For example, if a memory address is accessed within an atomic transaction, the memory address should not be modified elsewhere until the atomic transaction completes. Thus, if a processor (or a thread in a multithreading environment) uses an atomic transaction to access a set of memory addresses, the atomic transaction semantics should guarantee that another processor (or another thread) cannot modify any of the memory addresses throughout the execution of the atomic transaction.
Atomic transactions can be implemented at architecture level via architecture and micro-architecture support, rather than at software level via semaphores and synchronization instructions. Architecture-level atomic transactions can potentially improve overall performance using speculative executions of atomic transactions as well as elimination of semaphore uses. Supporting atomic transactions architecturally often requires expensive hardware and software enhancements, such as large on-chip buffers for data of uncommitted atomic transactions, and software-managed memory regions for on-chip buffer overflows. Various architecture mechanisms supporting atomic transactions have been proposed. Architecture support of atomic transactions needs to provide conflict detection between atomic transactions, and data buffering for uncommitted transactional state. Conflict between different atomic transactions accessing same memory locations is usually detected by hardware on-the-fly. This can be achieved with reasonable implementation cost and complexity because the underlying cache coherence mechanism of the system can be used. However, it can be quite challenging to buffer data for uncommitted transactions with reasonable cost and complexity, if an atomic transaction can modify a large number of memory locations that cannot fit in an on-chip buffer (a dedicated buffer, on-chip L1/L2 caches, or a combination of both).
Existing architecture support of atomic transactions either requires that an atomic transaction ensure buffer overflow cannot happen, or fall back to some software solution when buffer overflow happens. The first approach inevitably limits the use of atomic transactions. The second approach often requires software to acquire some semaphore such as a global lock (that protects the whole address space) to ensure atomicity of memory accesses. The approach that falls back to a global lock in the event of buffer overflow inevitably limits concurrency since the lack of fine-granular tracking of read and write sets requires that all transactions (not only the overflowing one) acquire the same lock.
One prior art reference in particular entitled “801 storage: Architecture and Programming” in ACM Transactions on Computer Systems (TOCS), Vol. 6, pages 28-50, 1988 to A. Chang, et al. describes a storage system architecture that maintains lock-information in the page table entries to record the activity of transactions on different regions of memory and to facilitate conflict resolution. However, in this teaching, speculative state is virtualized and consequently there is no distinction of overflow and non-overflow case. Further, the speculative data is held in main memory and the non-speculative data is held on disk. The lockbits implemented are used as reservation mechanism for several concurrent transactions, not just by a single overflowing transaction.
It would be highly desirable to provide a mechanism for handling the case of transaction state overflow in hardware-based transactional memory systems.
The present invention is directed to a novel transactional memory support system for handling memory-based atomic operations for processors in a multiprocessing environment.
In one aspect, the present invention is directed to a novel transactional memory support system for handling buffer overflow conditions using a lock in hardware-based transactional memory systems.
More particularly, the system and method of the present invention provides a solution that improves the situation by not requiring that a transaction execution be serialized in the event of buffer-overflow. When a transaction encounters a buffer-overflow, a fine-granular, hardware-supported locking mechanism is used to reserve memory locations that are accessed by the overflowing transaction. The operation of concurrent, non-overflowing transactions is minimally affected.
Furthermore, the system and method of the present invention facilitates the execution of non-revocable operations such as I/O during a transaction. The handling of non-revocable operations includes establishing the guarantee that a transaction can commit before executing the non-revocable operation.
Furthermore, in accordance with the present invention, the use of a fine-granular, hardware-supported locking mechanism facilitates the execution of a non-revocable operation inside a transaction without restricting execution of ordinary revocable memory access operations in concurrent transactions.
Thus, in accordance with one aspect of the invention, there is provided a system, method and computer program product for processing overflow transactions in a hardware-based transactional memory system. The transactional memory system is provided in a multiprocessing system having one or more processor devices and a shared memory storage system, and implements a best effort hardware transactional memory system. The method includes a locking means enabling the acquiring, by a processor device, of lockbits associated with a memory structure of said shared memory storage system to be reserved when a transaction transits from a non-overflow to an overflow mode or is already in overflow mode. The lockbits determine the granularity at which memory reservations for an overflow transaction are recorded. The method includes a control mechanism for controlling concurrency between overflowing and non-overflowing transactions requested by processor devices in the multiprocessing system, the method enabling only one overflowing transaction to execute at a time in the multiprocessing system.
Further to this aspect of the invention, the lockbits are acquired by a processor device at the time of a transactional read or write operation executed by an overflow transaction.
Additionally, the method ensures that a status of an overflowing transactions' read-set and write set status are validated prior to acquiring the lockbits.
According to the invention, a lockbit field including the lockbits associated with said memory structure are provided in a page table entry used for translating a virtual address to a physical address.
According to a further aspect of the invention, the lockbits specify a granularity at which memory reservations for an overflow transaction are recorded.
Further to this aspect of the invention, the controlling of concurrency between overflowing transactions in the multiprocessing system comprises:
setting an overflow flag associated with a processor device when that processor device transits to a transaction overflow mode; and,
when a processor device desires to transit to a transaction overflow mode, determining whether any other the processor device in the multiprocessor system is in or about to transit to an overflow transaction state, and,
preventing the processor device from transiting to the overflow transaction state when it is detected that another the processor device is in the multiprocessor system is in or about to transit to an overflow transaction state
Furthermore, the lockbits are inspected for detecting memory access conflicts between overflowing and non-overflowing transactions by processor devices. A memory access conflict occurs when transactions request concurrently access to the same memory address and at least one access is a write.
Furthermore, an overflow flag includes a system overflow flag indicating any processor in said system transiting to or in a overflow mode, each processor executing non-transactional memory access operations first checks said system overflow flag and a lockbit for a requested memory address prior to accessing a memory location associated with that address for a memory operation.
Furthermore, the preventing comprises delaying a processor's non-transactional memory access operation until said system overflow flag and acquired lockbits for that requested memory location are cleared.
The present invention is advantageously employed in a multiprocessing computer system, which may be implemented in System-on-Chip integrated circuit designs having a plurality of processor devices each for access a shared memory structure, however, can easily be adapted for use in other types of multiprocessor computing systems.
The objects, features and advantages of the present invention will become apparent to one skilled in the art, in view of the following detailed description taken in combination with the attached drawings, in which:
The present invention proposes a new mechanism to handle the case of transaction state overflow in best effort hardware-based transactional memory systems (BET). The invention utilizes the mechanisms for read-set, write-set tracking and buffering of the BET system in the case of non-overflowing transactions and uses a combination of said mechanisms with per memory reservation structures (lockbits, e.g., at a per page or line granularity) in the event of transaction overflow. Reservations (lockbits) are used to control concurrency and conflict detection between the overflowing and other non-overflowing transactions. The design permits at most one overflowing transaction.
The cost of overflow is largely paid by the transaction that overflows. Cost incurred for concurrent non-overflowing transactions are twofold: (i) check of lock bits at load, store access, (ii) polite conflict management (non-overflowing transactions steps back in favor of an overflowing transaction).
The proposed invention is an extension to the following technology: 1) A best-effort hardware transactional memory system (BET); 2) An invalidation-based cache coherence protocol. Such protocol is required to enable overflowing transaction to acquire ownership of data read and written by non-overflowing transactions; and, 3) A mechanism for tracking fine-granular, precise reservations for contiguous sections or memory (lockbits). In one embodiment, it is assumed that such reservation functionality is implemented as an extension of the memory address translation mechanism, i.e., as extension of the page table and associated caching structures. The caching structures are associated with individual processors and enable efficient access to page table entries by said processor. An implementation of such address translation cache is, for example, a translation lookaside buffer (TLB). When implementing the present invention, a TLB entry includes the same extension as the page table entries, i.e., the lockbit field as shown and described with reference to
Besides handling transaction overflow, the proposed invention facilitates execution of non-revocable operations, e.g. I/O, inside transactions, e.g., by forcing a transaction to overflow mode when a non-revocable operation is requested. The rational behind this mechanism is that a transaction that executes in overflow mode is guaranteed to execute successfully to completion (commit).
Embodiments of the present invention include, inter alia, (i) use of a cache-based best effort transactional memory system to handle non-overflowing transactions, (ii) use of a reservation-based mechanism, herein referred to as “lockbits”, to handle overflow case, (iii) a mechanism that guarantees that there is at most one overflowing transaction in a multiprocessor system, and (iv) a mechanism that establishes reservations at the time of transaction overflow through a traversal of the transaction buffer of the overflowing transaction.
One aspect of the present invention that extends the use of a cache-based best effort transactional memory system to handle non-overflowing transactions, is now described. In accordance with this aspect, as shown in
A further aspect of the present invention that extends the use of a reservation-based mechanism, herein referred to as “lockbits”, to handle the overflow case, is now described with respect to
With reference now had to
A further aspect of the present invention that guarantees that there is at most one overflowing transaction in a multiprocessor system, is now described. Particularly, when two or more transactions in the system overflow simultaneously, one of the transactions is aborted. Limiting the use of the lock-based overflow mechanism to a single processor ensures absence of deadlock. It is understood that more sophisticated policies, like blocking, are also possible. The present invention thus includes a mechanism that can establish consensus among all processors such that only a single processor at a time is processing a transaction in overflow mode. A description of the architectural extensions and protocols that achieve this functionality is now described with respect to
In one embodiment of the invention, information about the overflow status of a transaction is specified by the processor status register (PSR) 501, as illustrated in
A further aspect of the present invention that establishes reservations at the time of transaction overflow through a traversal of the transaction buffer of the overflowing transaction in a multiprocessor system, is now described with respect to
Details regarding the protocol 800 for transaction memory access load and store operations in accordance with the invention are illustrated in
The mechanism of overflow handling described in this disclosure can maintain strong atomicity semantics if the underlying best effort transactional memory system does so. Such mechanism is as described in the reference to C. Blundell, Ch. Lewis and M. Martin, Milo entitled “Deconstructing Transactions: The Subtleties of Atomicity”, Fourth Annual Workshop on Duplicating, Deconstructing, and Debunking, June, 2005, the whole contents and disclosure of which is incorporated by reference as if fully set forth herein. Strong atomicity means that atomicity and isolation guarantees of a transaction are not only guaranteed with respect to other transactions but also with respect to concurrent non-transactional memory access. To support strong atomicity semantics, the method of non-transactional memory access should be extended as illustrated in
The situation is slightly different for a store access as illustrated in
It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. In one embodiment, the present invention may be implemented in software as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
Referring to
While there has been shown and described what is considered to be preferred embodiments of the invention, it will, of course, be understood that various modifications and changes in form or detail could readily be made without departing from the spirit of the invention. It is therefore intended that the invention be not limited to the exact forms described and illustrated, but should be constructed to cover all modifications that may fall within the scope of the appended claims.