The present invention relates generally to a process method and apparatus of computer system, in particular, to a method and apparatus of lock allocation control.
Multi-core processor refers to a single chip that contains a plurality of processor cores, the single chip can be inserted into a single processor slot directly, but operating system will utilize all associated resources, so that each processor core thereof will be used as a separate logic processor. By dividing tasks between two processor cores, the chip that contains multiple processor cores can perform more tasks during a specific clock period. Multi-core technology enables a server to handle tasks in parallel, a multi-core system is easier to expand, and can incorporate stronger process performance into more compact size, and such size will use less power consumption and heat produced by computing power consumption will be less.
In order to bringing more computation power, the multi-core technology presents great challenges in front of programmers of how to use them efficiently. Lock technology based on shared memory has long been one of the essential approaches adopted by programmers to provide mutually exclusive access to shared resource in shared memory. In a multi-core system, for example, in a dual-core system, there are two cores A, B that want to use a same lock, then when core A has acquired the lock, core B will be in block state until A has released the lock; at this time, only one of the two CPU cores is used, and the other one is in idle state; thus a phenomena of performing in serial will occur due to contention of lock by a plurality of cores, thereby substantially reducing multi-core performance.
The present invention provides a novel method and apparatus for lock allocation control. According to the technical solution of the invention, when a processor core acquires a lock, other processor cores do not need to constantly poll memory to check whether the required lock is released, instead, other processor cores will be in sleep state, the invention will selectively wake up next processor core based on predetermined rule, such that an out-of-order lock contention procedure is turned into an in-order lock allocation procedure. By selectively waking up processor core that is in sleep state, the invention can avoid occupying a large amount of bus bandwidth and can save power consumption of chip. Further, the invention can also increase probability of obtaining data resource from cache by optimizing the predetermined rule, thereby reducing occurrence of cache miss.
Specifically, the invention provides a method for performing lock allocation for a plurality of processor cores, wherein the processor cores locate in computer node, and wherein a first processor core acquires a lock, while other processor cores that need to acquire said lock are in sleep state, the method including: receiving a signal that the first processor core has released said lock; determining a second processor core that should be woken up from other processor cores that need to acquire said lock and are in sleep state based on predetermined rule for allocating said lock; and waking up the second processor core to enable it to acquire said lock.
The invention further provides a lock allocation controller for performing lock allocation for a plurality of processor cores, wherein the processor cores locate in computer node, and wherein a first processor core acquires a lock, while other processor cores that need to acquire said lock are in sleep state, the lock allocation controller including: a lock state change receiving means for receiving a signal that the first processor core has released said lock; a target core determining means for determining a second processor core that should be woken up from other processor cores that need to acquire said lock and are in sleep state based on predetermined rule for allocating said lock; and a target core waking up means for waking up the second processor core to enable it to acquire said lock.
The invention also provides a computer system, including a plurality of processor cores, at least one cache, and the lock allocation controller as described above.
The above description illustrates some advantages of the invention on the whole, and these and other advantages thereof will become more apparent from drawings in conjunction with detailed description of the preferred embodiment of the invention.
The drawings referred in the description are only used to illustrate typical embodiments of the invention, and should not be considered as a limitation on the scope of the invention.
In the following discussion, a large amount of specific details are provided to facilitate to understand the invention thoroughly. However, for those skilled in the art, it is evident that it does not affect the understanding of the invention without these specific details. And it will be recognized that, the usage of any of following specific terms is just for convenience of description, thus the invention should not be limited to any specific application that is identified and/or implied by such terms.
Unless otherwise stated, the function described in the invention may be operated by software or hardware or combination thereof. However, in an embodiment, unless otherwise stated, these functions are performed by processors (such as computers or electric data processors) based on encoded integrated circuits (such as encoded by computer programs).
A unique feature of the invention is that a lock allocation controller is provided in computer node N1, such that computer core can perform occupying and releasing operation of lock without accessing memory through bus, rather, information associated with lock may be stored in the computer node. This can reduce resource waste on bus, and can also reduce time delay due to accessing memory through bus. As can be appreciated by those skilled in the art, the speed at which processor core accesses memory through bus is significantly slower than the speed at which processor core accesses inside of computer node. Computer node not only can store lock state information, but also can deploy associated operation logic therein, such that it can selectively wake up the processor cores that are in sleep state based on predetermined rule.
The lock state change receiving means is used to receive a change of lock state from processor core. In particularly, according to an embodiment of the invention, bit 1 represents that lock state is idle, and bit 0 represents that lock is currently occupied. When lock state is idle (i.e. the value of lock state is 1), the lock allocation controller receives a request that processor core wants to access a certain lock through lock state change receiving means, and modifies lock state value, such that the lock state value is 0, and other processor cores know that this lock has been occupied. It can be known from the content in the lock information storage table of
Policy records therein predetermined rule for managing lock allocation. According to one embodiment of the invention, the predetermined rule is first in first out rule, that is, for a plurality of processor cores that are all in sleep states to wait for a certain lock, the lock allocation controller will wake up the processor core that first issues lock request preferentially. According to another embodiment of the invention, predetermined rule is round-robin rule, that is, for a plurality of processor cores that are all in sleep states to wait for a certain lock, the lock allocation controller will calculate round-robin queue based on round-robin rule, and wake up the processor core that has the highest priority in round-robin queue preferentially. The principal of round-robin rule is to allocate lock to processor core that issues request in turn. Of course, the invention is not limited to these two predetermined rules, rather, any predetermined rule can be applied to allocate lock. As shown in lock information storage table in
The target core determining means is used to judge which processor core that is in sleep state may be woken up based on predetermined rule after lock state value is changed from 0 to 1. According to the embodiment in
Applying lock allocation controller in multiple computer nodes differs from applying lock allocation controller in a single computer node in that, a same lock needs to be allocated among a plurality of computer nodes, so there is a need for a mechanism to ensure that a plurality of lock allocation controllers can coordinate with each other on the allocation of a same lock and to further reduce time delay due to inter node communication. The coordination mechanism will be described in detail in
The lock allocation controller in N1 includes a lock state change receiving means, a lock information storage table, a target core determining means, a target core waking up means, an inter-node communicating means, and preferably includes a first in first out queue (FIFO queue). The lock information storage table stores therein associated information of each lock, including lock identifier (Lock ID), lock state value (Valid), whether a Home Note, also referred to as home note, is contained, local core in waiting, remote node in waiting, computer node that is occupying lock (Current holder) and predetermined rule (Policy).
The lock state change receiving means is used to receive a change of lock state from processor core, including receiving lock request and lock release signal. In order to coordinate lock information storage tables in respective lock allocation controllers, according to one embodiment of the invention, one home note and several auxiliary notes are established for each lock, and these notes are deployed in lock allocation controllers of different computer nodes respectively. As shown in
It can be known from the content in lock information storage table in
According to an embodiment of the invention, whether a lock allocation controller contains home note can be judged from whether it contains a value of home note. There are various ways of allocating home note. The basic idea can be divided into two types, in which the first one is to evenly (to the best of its ability) allocate a plurality of locks into different computer nodes. If there are 999 locks in total, then 999 home notes of the 999 locks may be evenly divided into three portions, that is, each portion contains 333 locks, thus lock allocation controller of each computer node contains 333 home notes and 666 auxiliary notes. The content about auxiliary notes will be described in detail below. There are also various types of logic for allocating lock, in which a simpler approach is to perform modular operation (such as perform operation with modulo 3) on ID number of a lock, and then allocate home notes based on mantissa (such as 1, 2 or 3) after the operation. According to an embodiment of the invention, processor core may perform logic operation with modulo 3 each time it accesses lock allocation controller, so as to calculate computer node that stores home note of lock. According to another embodiment of the invention, one bit in lock information storage table can be used to identify whether the note is a home note; in the example of
A second way to allocate home note is to allocate (to the best of its ability) home note of a lock into lock allocation controller corresponding to processor core that frequently needs to use the lock, thereby reducing time delay due to synchronize auxiliary note with home note and further optimizing the performance of lock allocation. Programmers can either allocate home note of lock in frequently accessed computer nodes manually based on their own experience, or they can judge which lock is more frequently accessed by which computer node based on feedback of system operation, that is, they can collect statistics on feedback result, so as to create a recommended scheme for allocating home note of lock.
Moreover, the invention can also only store home note but not auxiliary note. Accordingly, if a processor core can not find home note of the requested lock in lock allocation controller of the node where that core is located, then it can communicate with computer node where home note is located to acquire the requested lock, or that core may be placed in a waiting queue.
Predetermined rule for lock allocation is recorded in the predetermined rule in lock information storage table. Locality/FIFO/Distance represents that local processor core will be woken up preferentially when all the processor cores from different computer nodes want to acquire lock 1, and control right of the lock is delivered to remote computer node when all the local processor cores have ended occupation of lock 1; and if two or more local processor cores want to occupy lock 1, the lock allocation controller will preferentially allocate lock 1 to process core (0010) which is preceding in time sequence according to FIFO rule; if two or more remote computer nodes (such as N2 and N3) all contain processor cores that are in sleep state and are waiting for the occupation of lock 1, then the lock allocation controller will preferentially allocate lock 1 to remote computer node that is physically closest to local computer node (N1) (if the physical distance between N2 and N1 is shorter than the physical distance between N3 and N1, processor core in N2 will occupy lock 1 after processor core in N1 has finished occupying lock 1); thereby further saving time delay in allocating lock and optimizing performance of lock allocation. Further, there may be two embodiments for achieving the occupation of lock 1 by processor core in N2. According to the first embodiment, the lock allocation controller in N1 will notify the lock allocation controller in N2; then processor core in N2 will be woken up by the lock allocation controller in N2. According the second embodiment, the lock allocation controller in N1 will directly wake up processor core in N2, in this case, the lock allocation controller in N1 needs to record remote process core that needs to acquire lock 1 and the computer node thereof.
As can be appreciated by those skilled in the art, the predetermined rule may have many variations, for example, if the predetermined rule is Locality/FIFO/FIFO, then it represents that local computer node has priority over remote computer node, and at local, the allocation of lock will be performed based on the sequence of first in first out, and among different remote computer nodes, the allocation of lock will also be performed based on the sequence of first in first out. Further, if the predetermined rule is Locality/Round-Robin/ FIFO, then it represents that local computer node has priority over remote computer node, and at local, the allocation of lock will be performed based on a preference sequence obtained from round-robin rule, and among different remote computer nodes, the allocation of lock will also be performed based on the sequence of first in first out. Still further, if the predetermined rule is FIFO, then it represents that whether local processor core or remote processor core will occupy lock based on the sequence of first in first out, in this case, FIFO queue records therein not only identifier of local processor core, but also identifiers of all the processor cores that need to occupy lock and identifiers of computer nodes corresponding to these processor cores.
Target core determining means is used to judge which of the local processor cores that are in sleep state will be woken up based on predetermined rule after lock state value is changed from 0 to 1. According to the embodiment in
As a variation to the above embodiment, the invention will not distinguish home note from auxiliary note, and will set values of home note and auxiliary note in lock allocation controller to be completely identical. Thus, after all the processor cores in a node have ended occupation of lock 1, each computer node can directly deliver control right of lock 1 to another computer node without having to communicate with the computer node where home note is located. For example, N1, N2, N3 all need to occupy lock 1, after N1 has ended occupation of lock 1, control right is delivered to N2, and after N2 has ended occupation of lock, control right is directly delivered to N3; in order to keep synchronization among the lock allocation controllers, N1 will confirm that N2 has delivered control right of lock 1 to the next computer node.
According to the embodiment in
Based on predetermined rule of Locality/FIFO/Distance of lock 1, once N1 issues a node waking up signal to N2 through the inter-node communicating means, N2 will judge which local processor core should be woken up based on its own auxiliary note. When processor core in N2 ends the occupation of lock 1 in a sequence of first in first out, N2 will send a return signal to N1 through the inter-node communication means, and give control right of lock 1 back to N1 again. Thus, processor core of each computer node can complete occupying and releasing operation of lock by merely communicating with local lock allocation controller.
After C1 (1000) in N2 has released lock 1, C2 (0100) in N2 occupies lock 1 again; at this time, there is no need for hardware thread on the C2 to access memory again so as to reading/writing data resource, rather, it may first attempt to obtain data resource corresponding to lock 1 from cache of N2; if corresponding data resource is stored in cache of N2, C2 does not need to access memory, thereby saving the resource of bus and saving the time needed to access data resource. If corresponding data resource is not stored in cache of N2, for example, the data in cache has been updated, then C2 will access memory again to obtain the needed data resource.
Specifically,
If it is judged that the lock state in home note of the first lock is being occupied in step 803, a sleep signal is sent to the first processor core in step 813, such that it enters into sleep state and will not constantly poll lock state information of the first lock. The first processor core is registered in a local FIFO queue in step 815 to wait for subsequent waking up operation. The FIFO queue herein is merely illustrative, and any other algorithm may be used to order the processor cores that are in sleep state. After the first lock is released, the first processor core is selectively woken up based on predetermined rule in step 817, and information in home note is updated in step 819, which includes deleting first processor from value of the processor cores in home note that are in sleep state, shifting and updating information of processor cores in the FIFO queue correspondingly.
If it is judged that lock state of the first lock in the home note is occupied in step 905, a sleep signal is sent to the first processor core to enable it to enter into sleep state. The first processor core is registered in a local FIFO queue to wait for processing in order in step 917. After the first lock is released, the first processor core is selectively woken up based on predetermined rule in step 919. And, information in home note is updated in step 921, which includes deleting the first processor core from the local processor cores that are in sleep state, shifting and updating information of processor cores in the FIFO queue correspondingly.
If it is queried that the first lock is not occupied by other processor core of computer node where the first processor core is located in step 1001, then it is judged whether lock state in the home note is idle in step 1007. As can be appreciated by those skilled in the art, if home note is synchronized with auxiliary note, the auxiliary note can also be queried as to whether lock state is idle. In summary, when the lock state of the first lock is idle, a signal is sent to the first processor core to allow it to occupy the first lock in step 1009. And, information in home note and auxiliary note are updated in step 1011, which further includes updating lock state information of the first lock in home note and auxiliary note and information in the computer node that is occupying the lock.
When the first processor core ends the occupation of the first lock, a signal that the first processor core has released the first lock is received in step 1013. Information in home note and auxiliary note are updated in step 1015, which includes updating lock state information in home note and auxiliary note and information in the computer node that is occupying the lock.
If it is judged that the lock state in the home note is occupied in step 1007, a sleep signal is sent to the first processor core such that it enters into sleep state in step 1017. And, the first processor core is registered in a local FIFO queue in step 1019.
After the first lock is released, the first processor core is selectively woken up based on predetermined rule to enable it to occupy the first lock in step 1021, and information in the auxiliary note or home note is updated in step 1023, which includes updating the computer node that is occupying lock to the computer node where the first processor core is located. And, updating of information in an auxiliary note further includes deleting the first processor core from the local processor cores that are in sleep state, shifting and updating information of processor cores in the FIFO queue correspondingly.
Various embodiments of the invention can provide many advantages, including those that are illustrated in summary of the invention and those that can be derived from technical solution per se. However, whether one embodiment can gain all advantages and whether such advantages are considered as a substantial improvement should not be considered as a limitation to the invention. Meanwhile, various implementations mentioned above are merely for illustration purpose, those skilled in the art can make various modifications and alterations to the above implementations without departing from the substance of the invention. The scope of the invention is entirely defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
200910261073.5 | Dec 2009 | CN | national |