This application claims priority under 35 U.S.C. § 119 to Chinese Patent Application No. 200710105004.6 filed May 18, 2007, the entire contents of which is incorporated herein by reference.
The present invention relates to a lock mechanism for shared memory in a multi-core processor. More specifically, the present invention relates to a lock mechanism based on address arbitrator for shared memory in a multi-core processor.
As the development of semiconductor technique, multi-core processors (for example, cell processors) are widely used. Multi-thread programs running on cores of a multi-core processor must control the concurrent access to the shared memory region. The common way of the control is to synchronize the threads by lock/semaphore. Therefore the efficiency of lock/semaphore implementations is a key factor for the performance of multi-thread platforms. The implementation of a lock will impact not only the overhead of synchronization operations, but also the block time of threads waiting for the release of the lock. This will be even critical to the success of current processors, which adopt multi-core multi-thread as an important technology to get full utilization of the die size.
Normally the lock/unlock operations have been implemented as a combination of hardware supported shared memory systems and atomic synchronization primitives, e.g. test-and-set (T&S), compare-and-swap (C&S), and load-linked/store-conditional (LL/SC). These hardware support shared memory systems provide a mechanism to block the global memory access/communications when an atomic primitive is ongoing, e.g., the bus lock in x86 processors. This works for the traditional shared memory multi-processor platforms, since the memory interface/bus is the only way for processors to carry out global communications. However, for current or future multi-core processors, this mechanism degrades the system performance in two aspects:
1. All the lock/unlock operations converge at the memory interface to resolve potential competitions. The off-chip memory interface was already the bottleneck of system, not only because of its bandwidth, but also the latency, which is about hundreds or thousands of times of the on-chip cache latency. Even if the access confliction can be resolved in shared on-chip L2/L3 cache, the overhead of operation is still one order of magnitude higher.
2. More and more network topologies are adopted as the global interconnection in multi-core chips, to support concurrent data transactions/communications. For example, there is a ring network in Cell processor.
The network as shown in
The illustrative embodiments of the present invention described herein provide a method, apparatus, and computer usable program product for detecting the order of wagons in a train. The embodiments described herein further provide if and how the order of wagons in a freight train is changed in a reliable manner.
An exemplary feature of an embodiment of the present invention is a processor consisting of one or more processing cores, an address arbitrator, where one or more processing cores are configured to submit to the address arbitrator a lock transaction request corresponding to a specific instruction in response to the execution of the specific instruction, and the lock transaction request includes a lock variable address asserted on an address bus. The processor further consists of a lock controller for performing lock transaction processing in response to the lock transaction request, and notifying a processing result to the processing core from which the lock transaction request was sent out. The processor further consists of a switching device, coupled to the address arbitrator and the lock controller, for identifying the lock transaction request and notifying the lock transaction request to the lock controller.
Another exemplary feature of an embodiment of the present invention is method for processing a lock transaction in a processor consisting of one or more processing cores, where one of the processing cores submits a lock transaction request corresponding to a specific instruction to a address arbitrator where the address arbitrator is to execute a specific instruction. The method further consists of the step of asserting a lock variable address on a address bus. The method further consists of the step of identifying the lock transaction request. The method further consists of the step of performing the lock transaction processing and notifying the processing result to one of the one or more processing cores.
Another exemplary feature of an embodiment of the present invention is a program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for method for processing a lock transaction in a processor with one or more processors. The method consists of one of the processing cores submits a lock transaction request corresponding to a specific instruction to a address arbitrator where the address arbitrator is to execute a specific instruction. The method further consists of the step of asserting a lock variable address on a address bus. The method further consists of the step of identifying the lock transaction request. The method further consists of the step of performing the lock transaction processing and notifying the processing result to one of the one or more processing cores.
Various other features, exemplary features, and attendant advantages of the present disclosure will become more fully appreciated as the same becomes better understood when considered in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the several views.
The figures form a part of the specification and are used to describe the embodiments of the invention and explain the principle of the invention together with the literal statement. The foregoing and other objects, aspects, and advantages will be better understood from the following non-limiting detailed description of preferred embodiments of the invention with reference to the drawings, wherein:
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings.
In the following description, an embodiment of the present invention will be described by referring to the structure of cell processor shown in
PUs 102, 103 and 104 are processing cores running application threads. A single PU may run a single thread or run a plurality of threads at the same time. Like the ring network in
According to an embodiment of the present invention, the bus interface further comprises signal lines for lock operations, i.e., “lock” signal, “acquire/release” signal and “lock value”. A lock transaction is usually divided into three phases:
Request phase. When a PU requests for performing a lock transaction on a lock variable, the address of the lock variable is placed on the address bus to indicate the lock variable; the “lock” signal is asserted to notify the address arbitrator and lock controller 101 that the present request is directed to a lock transaction; and the type of requested lock transaction is asserted through the “acquire/release” signal, i.e., lock acquisition and lock releasing. In addition, information for identifying the thread issuing the request may be provided to the address arbitrator and lock controller 101 through, for example, “lock value” or an additional signal line.
Processing phase. The address arbitrator and lock controller 101 performs corresponding processing (will be illustrated by referring to
Responding phase. In the lock transaction aspect, the “grant/reject” signal is used to indicate the type of result of the lock transaction request to the PU. For a lock transaction request from the PU, the address arbitrator and lock controller 101 may have 3 kinds of responses in the next cycle. The first is “grant” (indicated by the “grant/reject” signal), i.e., the lock transaction request is processed successfully. The second is “reject” (indicated by the “grant/reject” signal), i.e., the lock transaction request is failed. The third is “hold” (indicated by the “hold” signal), i.e., the lock transaction is paused because the lock variable involved with the lock transaction request is not in the address arbitrator and lock controller 101. For the third case, the address arbitrator and lock controller 101 further provides a lock ID to the PU through the “lock value” signal, to identify the paused lock transaction. When the requested lock variable is loaded into the address arbitrator and lock controller 101, the address arbitrator and lock controller 101 proceeds to process the lock transaction request and returns the final granting result (“grant/reject” signal) identified with the lock ID (“lock value” signal) to the requesting PU. For the third case, the correspondence between the requesting thread and the returned lock ID is maintained in the PU, in order to be able to find the relevant thread when receiving the final result.
An application can arbitrarily specify the memory location at an address as a lock variable because a specific lock variable is identified by the address on the address bus. Accordingly, the application is required to initialize a lock/semaphore before using the lock/semaphore, for example, writing an initial value or a magic number for lock transaction verification to the address. As stated above, a specific (lock/unlock) instruction is then used to perform atomic operation on the lock variable.
These lock signal operations by the PU on the bus interface 204 according to the specific instruction may be transparent for the program threads running on the PU. For example, for the multi-core processor (cell processor) shown in
The address arbitrator and lock controller 101 and the processing performed in response to the lock transaction requests will be described by referring to
By referring again to
The lock controller 203 is responsible for lockup table management, lock variable searching and updating, and lock transaction processing and so on. More specifically, when the lock controller 203 receives a lock transaction request from a PU through the bus interface 204, it obtains the address of a lock variable related to the lock request from the address bus, retrieves the lock variable corresponding to the address from the fast lock lockup table 202, performs corresponding modification to the retrieved lock variable according to the type of the lock transaction, and returns the result to the requesting PU. If there is no lock variable corresponding to the address found in the fast lock lockup table, the lock controller 203 loads the variable via the requesting PU or directly from the memory or shared cache. If required, it is possible to perform some format verification or conversion at the loading phase.
An exemplary procedure of lock operation will be described by referring to
According to an embodiment of the present invention, if the lock variable value is larger than zero, then at step S18, the lock controller 203 asserts the “grant” signal through the bus interface 204 as a response to the requesting PU. Then the PU successfully acquires the lock. At the same time, the lock controller 203 decreases the value of the lock variable, and updates the lockup table entry with a new value and owner (PU). If the lock variable value is less than or equal to zero, then at step S20, the lock controller 203 asserts the “reject” signal through the bus interface 204 as a response to the requesting PU. The lock acquisition operation is failed or a zero is returned for the T & S instruction.
Although the instruction execution portion of the PU in the embodiment is required to identify the special instructions relating to lock operations, it is also possible to perform lock variable access by using a specially stated memory region or specific addresses of identifiable characteristics. In the latter case, if the instruction execution portion identifies that the address related to an instruction fall within the memory region or belongs to the specific addresses, it is treated as lock operation.
Although the embodiments of the present invention have been described by referring to a multi-core processor, a person skilled in the art knows that, because of the use of the lock ID and owner field, different threads in the same core are able to identify responses to their respective lock requests, and for the same lock variable, the lock controller is able to discriminate different thread in the same core. Therefore, the present invention is also applicable to a single core processor (a special example of the multi-core processor).
Although examples of specific signal lines have been provided to illustrate the interface between the PU and the address arbitrator and lock controller, one skilled in the art knows that, the present invention is not limited to these specific examples, but is able to be modified according to specific needs to perform processing relating to lock transactions.
The above-disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments that fall within the true spirit and scope of the present invention. Thus, to the maximum extent allowed by law, the scope of the present invention is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description.
While the present invention has been described with reference to what are presently considered to be the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. On the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the following claims is to be accorded the broadcast interpretation so as to encompass all such modifications and equivalent structures and functions.
Number | Date | Country | Kind |
---|---|---|---|
200710105004.6 | May 2007 | CN | national |