In concurrent computing systems, including particularly those that include multi-core processors or, alternatively, multiple processors, it is often necessary for concurrently executing processes to arbitrate entry into a critical section of a program. This is often because a program executing in the critical section is accessing a resource that may only be accessed exclusively and must exclude all other programs from simultaneous access.
Many methods are known for such arbitration. For example, programs may achieve mutual exclusion for a critical section using test-and-test-and-set (TTS) locks, or Reader_Writer locks, each well known in the art. For certain applications, alternatively, a queue based lock may be used. Queue based locks, as is well known, include Ticket locks, Mellor-Crummey Scott (MCS) locks and Craig, Landin, and Hagersten (CLH) locks. MCS and Ticket locks are described in, for example, J. M. Mellor-Crummey and M. Scott, Algorithms for Scaleable Synchronization on Shared Memory Multiprocessors, ACM Transactions on Computer Systems, vol. 9, no. 1, February 1991. CLH locks are described, for example, in Michael L. Scott and William N. Scherer III, Scalable Queue-Based Spin Locks with Timeout, in Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, pp 44-52, 2001.
A technique termed Speculative Lock Elision (SLE) may be used to reduce unnecessary serialization caused by concurrent processes that need to access the same lock-related variables or have to wait on the same lock queue. SLE dynamically removes unnecessary lock-induced serialization, relying on the property that locks do not always have to be acquired for a correct execution. Synchronization instructions such as those that test or set locks that are predicted to be unnecessary, are bypassed or elided. This allows multiple threads to concurrently execute critical sections protected by the same lock without having to actually acquire the lock. Misspeculation due to inter-thread data conflicts is detected using existing cache mechanisms and rollback is used for recovery. Successful speculative elision is validated and committed without acquiring the lock. See Ravi Rajwar and James R. Goodman, Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution, Proceedings of the 34th International Symposium on Microarchitecture (MICRO), 2001. Currently known approaches to SLE are, however, limited to the elision of simple non queued locks such as TTS locks.
Transactional support in hardware for lock-free shared data structures using transactional memory is described in M. Herlihy and J. Moss, Transactional memory: Architectural support for lock-free data structures, Proceedings of the 20th Annual International Symposium on Computer Architecture 20, 1993 (Herlihy and Moss). This approach describes a set of extensions to existing multiprocessor cache coherence protocols that enable such lock free access. Transactions using a transactional memory are referred to transactional memory transactions or lock free transactions herein.
Referring to
In one embodiment, a processor system such as that depicted in the figure adds a transactional memory system 100 that allows for the execution of lock free transactions with shared data structures cached in the transactional memory system, as described in Herlihy and Moss. The processor(s) 105 may then include an instruction set architecture that supports such lock free or transactional memory based transactions. In such an architecture, the system in this embodiment supports a set of instructions, including an instruction to begin a transaction; an instruction to terminate a transaction normally; an instruction to abort a transaction.
The system of
Transactional memory transactions provide a way to implement speculative lock elision for Reader-Writer locks, well known in the art, and for queue based locks such as CLH locks, Ticket locks and Mellor-Crummey Scott (MCS) locks, introduced above.
Turning first to
If the lock is not free at 240, then there are two possibilities. First, another process may actually be using the critical section and lock elision has to be abandoned. This is done by aborting the transaction, at 250, and then acquiring the lock or enqueuing the process in the lock acquisition queue, at 260 Once the lock has been acquired, the critical section may then be executed exclusively, 270, protected by the lock. On this path through the flow diagram (i.e. 240-250-252-254-270) the transaction is aborted at 250 to ensure correct atomic execution. This is because there may be other threads attempting to use the transaction and if the transaction is not aborted conflicts between such threads and the thread that has acquired the lock may go unnoticed.
It is possible for the Acquire Lock with Elision process to be invoked recursively, thus allowing recursive locking. There is no inherent limitation in the embodiment that prevents recursive locking from being correctly implemented. The path 240-250-260 in
If the lock is free at 240, then any other process using the critical section must also be in a transaction and so protected by the transactional memory mechanisms from undetected conflicts, and lock acquisition may be elided. The process in this case, simply enters the critical section at 270 and elides the lock acquisition step. As explained above, the atomicity of the transactional memory based transaction guarantees that if the transaction completes successfully, the critical section will have executed correctly without interference from other concurrent processes.
As indicated, the processing and the correctness of the processing in the figure are independent of the underlying lock mechanism involved in checking the status of the lock at 230 and in acquiring the lock at 260. In the case of a TTS lock, the step 230 may simply be a test of the test-and-set variable that comprises the TTS lock. In the case of an MCS, ticket or other queue based data structure implementing a lock, the test may require checking of a queue or other data structure. Similarly, the step 260 in which a lock is acquired may require the process to enter into a busy wait loop or block in the case of a TTS lock; alternatively it may need to enqueue itself in the case of an MCS, Ticket, or another queue based lock.
Many variations on these above described embodiments are possible. As discussed above the embodiment described in
Table 1 is a C-like program in one embodiment in which a system that provides transactional memory based transaction provides an implementation of Ticket locks with elision. An implementation such as that outlined in Table 1 could allow existing programs that used ticket locks to use calls to the lock acquisition and release routines without changes to the calling program while the implementation defined in the table would provide transparent support for elision in the implementation of ticket locks.
The program shown in Table 1 essentially implements the flowcharts of
The symmetrical processing for lock release is then listed at lines 17-23. The program first checks the lock at line 19. If the lock is free, program has been executing within a transaction and the lock aquire was elided, and thus it ends the transaction at line 20. Otherwise, the lock is released at line 22.
Table 2 depicts similar processing when the lock is an MCS lock. As may be observed from the C code segment outline in the table, it is identical to Table 1 except that in this program, the lock is an MCS lock and the corresponding calls to check the lock for availability, acquire and release the lock are the corresponding MCS lock calls.
As should be clear to one in the art, the tables above are merely exemplary code fragments in one embodiment. In other embodiments, the implementation language may be another language, e.g. C++ or Java, or another language; the variable names used may vary, and the names of all the functions accomplished by the programs listed above may be arbitrarily varied, without changing the input and output relationship, as is known.
In the preceding description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the described embodiments, however, one skilled in the art will appreciate that many other embodiments may be practiced without these specific details.
Some portions of the detailed description above are presented in terms of algorithms and symbolic representations of operations on data bits within a processor-based system. These algorithmic descriptions and representations are the means used by those skilled in the art to most effectively convey the substance of their work to others in the art. The operations are those requiring physical manipulations of physical quantities. These quantities may take the form of electrical, magnetic, optical or other physical signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the description, terms such as “executing” or “processing” or “computing” or “calculating” or “determining” or the like, may refer to the action and processes of a processor-based system, or similar electronic computing device, that manipulates and transforms data represented as physical quantities within the processor-based system's storage into other data similarly represented or other such information storage, transmission or display devices.
In the description of the embodiments, reference may be made to accompanying drawings. In the drawings, like numerals describe substantially similar components throughout the several views. Other embodiments may be utilized and structural, logical, and electrical changes may be made. Moreover, it is to be understood that the various embodiments, although different, are not necessarily mutually exclusive. For example, a particular feature, structure, or characteristic described in one embodiment may be included within other embodiments.
Further, a design of an embodiment that is implemented in a processor may go through various stages, from creation to simulation to fabrication. Data representing a design may represent the design in a number of manners. First, as is useful in simulations, the hardware may be represented using a hardware description language or another functional description language. Additionally, a circuit level model with logic and/or transistor gates may be produced at some stages of the design process. Furthermore, most designs, at some stage, reach a level of data representing the physical placement of various devices in the hardware model. In the case where conventional semiconductor fabrication techniques are used, data representing a hardware model may be the data specifying the presence or absence of various features on different mask layers for masks used to produce the integrated circuit. In any representation of the design, the data may be stored in any form of a machine-readable medium. An optical or electrical wave modulated or otherwise generated to transmit such information, a memory, or a magnetic or optical storage such as a disc may be the machine readable medium. Any of these mediums may “carry” or “indicate” the design or software information. When an electrical carrier wave indicating or carrying the code or design is transmitted, to the extent that copying, buffering, or re-transmission of the electrical signal is performed, a new copy is made. Thus, a communication provider or a network provider may make copies of an article (a carrier wave) that constitute or represent an embodiment.
Embodiments may be provided as a program product that may include a machine-readable medium having stored thereon data which when accessed by a machine may cause the machine to perform a process according to the claimed subject matter. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, DVD-ROM disks, DVD-RAM disks, DVD-RW disks, DVD+RW disks, CD-R disks, CD-RW disks, CD-ROM disks, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnet or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions. Moreover, embodiments may also be downloaded as a program product, wherein the program may be transferred from a remote data source to a requesting device by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
Many of the methods are described in their most basic form but steps can be added to or deleted from any of the methods and information can be added or subtracted from any of the described messages without departing from the basic scope of the claimed subject matter. It will be apparent to those skilled in the art that many further modifications and adaptations can be made. The particular embodiments are not provided to limit the claimed subject matter but to illustrate it. The scope of the claimed subject matter is not to be determined by the specific examples provided above but only by the claims below.