This application claims the benefit of Korean Patent Application No. 10-2021-0133768, filed Oct. 8, 2021, and No. 10-2022-0104515, filed Aug. 22, 2022, which are hereby incorporated by reference in their entireties into this application.
The present invention relates generally to synchronization technology for improving the concurrent read performance of a critical section in distributed shared memory, and more particularly to new read-write synchronization technology for solving a performance degradation problem caused due to attempts to simultaneously read a critical section when a read-write synchronization method is used in distributed shared memory.
As multicore/manycore systems in which a number of CPU cores is installed are widely used, parallel programming, which improves performance using multiple cores, becomes more important and replaces technology for improving performance by merely increasing the operation clock speed of a CPU and memory.
Parallel programming, continuously developed and used in a High-Performance Computing (HPC) field, enables users to execute and control parallel programs using a parallel programming interface represented by a Message Passing Interface (MPI) or OpenMP. However, the existing HPC field mainly deals with workloads for numerical analysis and computation, which can be relatively easily parallelized, so a runtime load caused due to a task for parallelizing programs or the parallel program itself is not a great concern.
However, as manycore systems are popularized and as various and complex layers of software, including an operating system, a runtime framework, applications, and the like, run in a manycore environment, a parallelization task for efficiently using multiple cores becomes difficult and complicated.
Particularly, most systems have recently adopted Non-Uniform Memory Access (NUMA) in order to increase the system scale, and this increases the complexity of a parallelization problem. Also, a performance problem was exacerbated and has reached a serious level in a multi-node manycore system based on Distributed Shared Memory (DSM).
Distributed shared memory (DSM) is technology capable of increasing a memory volume by sharing memory units installed in multiple physical nodes through a high-speed interconnect, and is receiving attention again with the recent rapid improvement of interconnect performance and the emergence of Many-to-One Virtualization, which abstracts multiple nodes as a single large virtual machine. However, in spite of a fast interconnect, the performance of DSM is still far below that of local memory using a system bus, so a high memory performance load is caused.
In the process of parallelizing workloads, performance degradation is mostly due to a process of synchronization of access to data shared by multiple processes or threads. Such a synchronization process is performed using a lock mechanism, but, as is already known, the performance of a manycore system is rapidly degraded as a lock is frequently used for data synchronization. Particularly, when there is high contention for the use of a lock, multiple cores attempt to access a single lock variable, which increases cache-line bouncing (the process of repeatedly invalidating a cache value for each core and fetching a new value according to a cache consistency management policy) and rapidly decreases system performance.
A readers-writer lock is a synchronization mechanism in which, when only read-only requests are present, concurrent access to critical section data is allowed, but while a write request task holds a lock, concurrent access is not allowed. This is one of widely used synchronization mechanisms, because it has an effect of improving the scalability of a manycore system in a load condition in which most requests are read requests, but another performance problem may occur in distributed shared memory. When the existing readers-writer lock is used in distributed shared memory, performance degradation may be caused even when most requests are read requests. This is because multiple threads are allowed to simultaneously enter a critical section when there are only read-only requests, but the threads simultaneously try to change the value of a shared lock variable. This occurs also in a manycore system based on a single node, but distributed shared memory has lower performance than local memory, and a communication load between nodes may also be significantly increased by multiple requests to change the value of the shared lock variable. This load not only offsets performance benefits, which are acquired when most requests are read requests, but also results in overall performance degradation.
Therefore, when multiple concurrent reads are requested in distributed shared memory, it is difficult to improve performance using the existing readers-writer lock mechanism.
An object of the present invention is to mitigate performance degradation caused due to a shared lock variable when concurrent reads of a critical section are attempted in a distributed shared memory environment including multiple physical nodes.
Another object of the present invention is to assign a lock variable for each node and efficiently manage the same, thereby providing a new read-write synchronization mechanism capable of minimizing performance degradation caused due to sharing of a lock variable when a read lock is held.
A further object of the present invention is to improve the degree of parallelism of a system by improving the concurrent read performance of a critical section and to improve overall system performance.
In order to accomplish the above objects, a synchronization method for improving concurrent read performance of a critical section in distributed shred memory, performed by a distributed-shared-memory management apparatus in a physical node of a multi-node system, according to the present invention includes checking whether a lock is held on each node based on a read-write lock having lock variables for respective nodes in a distributed shared memory environment, acquiring a lock for a read operation or a write operation in consideration of whether the lock is held on each node, and releasing the lock based on the lock variables for the respective nodes when the read operation or the write operation is terminated.
Here, the read-write lock may have an array form including multiple entries, the number of which corresponds to the maximum number of physical nodes in the multi-node system.
Here, each of the multiple entries may include each of the lock variables for the respective nodes, and the multiple entries may be aligned so as to correspond to a minimum management unit size corresponding to the distributed shared memory environment.
Here, the values of the lock variables for the respective nodes, included in the multiple entries, are checked, whereby whether a read lock is held or whether a write lock is held may be checked.
Here, when a read lock is held on a current node, the lock for the read operation may be acquired by increasing the value of a lock variable included in an entry corresponding to the current node, among the multiple entries.
Here, when a read lock or a write lock is held on one or more nodes, release of the read lock or the write lock is waited for, after which the lock for the write operation may be acquired by changing the values of the lock variables for the respective nodes, included in the multiple entries, to a write lock acquisition state.
Here, the lock acquired for the read operation may be released by decreasing the value of a lock variable included in an entry corresponding to a current node, among the multiple entries.
Here, the lock acquired for the write operation may be released by initializing the lock variables for the respective nodes, included in the multiple entries.
Also, an apparatus for managing distributed shared memory according to an embodiment of the present invention includes a processor for checking whether a lock is held on each node based on a read-write lock having lock variables for respective nodes in a distributed shared memory environment, acquiring a lock for a read operation or a write operation in consideration of whether the lock is held on each node, and releasing the lock based on the lock variables for the respective nodes when the read operation or the write operation is terminated; and memory for storing the read-write lock.
Here, the read-write lock may have an array form including multiple entries, the number of which corresponds to the maximum number of physical nodes in a multi-node system.
Here, each of the multiple entries may include each of the lock variables for the respective nodes, and the multiple entries may be aligned so as to correspond to a minimum management unit size corresponding to the distributed shared memory environment.
Here, the processor may check the values of the lock variables for the respective nodes, included in the multiple entries, thereby checking whether a read lock is held or whether a write lock is held.
Here, when a read lock is held on a current node, the processor may acquire the lock for the read operation by increasing the value of a lock variable included in an entry corresponding to the current node, among the multiple entries.
Here, when a read lock or a write lock is held on one or more nodes, the processor may wait for release of the read lock or the write lock and then acquire the lock for the write operation by changing the values of the lock variables for the respective nodes, included in the multiple entries, to a write lock acquisition state.
Here, the processor may release the lock acquired for the read operation by decreasing the value of a lock variable included in an entry corresponding to a current node, among the multiple entries.
Here, the processor may release the lock acquired for the write operation by initializing the values of the lock variables for the respective nodes, included in the multiple entries.
The above and other objects, features, and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
The present invention will be described in detail below with reference to the accompanying drawings. Repeated descriptions and descriptions of known functions and configurations which have been deemed to unnecessarily obscure the gist of the present invention will be omitted below. The embodiments of the present invention are intended to fully describe the present invention to a person having ordinary knowledge in the art to which the present invention pertains. Accordingly, the shapes, sizes, etc. of components in the drawings may be exaggerated in order to make the description clearer.
Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
Generally, a multi-node system based on distributed shared memory may correspond to a structure such as that illustrated in
Referring to
Here, a distributed-shared-memory manager 212 or 222 in the host operating system manages the local memory and communicates with another distributed-shared-memory manager 222 or 212 in a remote node through an interconnect between the physical nodes, thereby accessing remote memory and using the same.
For example, application #1210 being executed in physical node 1 may use both the local memory 213 and the remote memory 223 in physical node 2 through the distributed-shared-memory manager 212. Also, application #2220 being executed in physical node 2 may use both the local memory 223 and the remote memory 213 in physical node 1 through the distributed-shared-memory manager 222.
As described above, memory in each node may be simultaneously accessed by processes or threads executed in different nodes, in which case a synchronization task for shared data is required.
Here, as illustrated in
Meanwhile, it can be seen that application #1310 preforms a write operation on memory region B, but application #2320 performs a read operation thereon. If application #1310 preempts a write lock, application #2320 is not able to access memory region B until application #1310 releases the write lock. Likewise, if application #2320 preempts a read lock, application #1 has to wait until application #2320 releases the read lock.
That is, when all processes or threads perform only read operations on a shared memory region, they can simultaneously access the shared memory region, so system scalability should not be affected thereby. However, in the case of distributed shared memory, which exhibits low performance when remote memory is accessed, even when only read operations are performed, serious performance degradation may be caused.
Here, a general read-write lock provides a total of four operations corresponding to a lock and an unlock for each of a read operation and a write operation. A read-write lock may be implemented using different lock variables for a read operation and a write operation, but the present invention describes an implementation method in which all of a read operation and a write operation are managed using a single lock variable.
Hereinafter, a general operation process using an existing read-write lock will be described in detail with reference to
Here, variable b illustrated in
First,
Referring to
If the value of variable b is equal to WRITE_ACQUIRED, this indicates that a write lock is held by another process or thread. Accordingly, after waiting for the release of the write lock at step S410, the value of variable b may be checked again.
When it is determined at step S405 that the value of variable b is not equal to WRITE_ACQUIRED, this indicates that a read-write lock is not held by any process or thread or that another reader is present, so a read lock may be acquired. Accordingly, the value of variable b is incremented by 1 at step S420, and the read lock operation may be terminated.
Referring to
Referring to
When it is determined at step S605 that the value of variable b is 0, a read-write lock is not held by any process or thread, so b is set to WRITE_ACQUIRED at step S620, and the write lock operation may be terminated.
Referring to
Here, when the value of variable b is changed by any of the operations illustrated in
In order to solve these problems, the present invention proposes synchronization technology for improving the concurrent read performance of a critical section in distributed shared memory using a new read-write lock.
Referring to
Here, the read-write lock may be an array form including multiple entries, the number of which corresponds to the maximum number of physical nodes in the multi-node system.
Here, each of the multiple entries includes a single lock variable for each node, and the multiple entries may be aligned so as to correspond to a minimum management unit size of the distributed shared memory environment.
For example, the biggest difference between the existing read-write lock 800 illustrated in
According to the present invention, the lock variable may be easily implemented as an array form having entries, the number of which is equal to the maximum number of physical nodes allowed in the system.
Here, in order to prevent false sharing between the nodes, each of the entries included in the lock variable array 910 may be aligned so as to correspond to the minimum management unit size of the distributed shared memory.
For example, when the management unit size of distributed shared memory is equal to a page size (4 KB) of x86 architecture, each of the entries of the lock variable array 910 may be aligned according to the size of 4 KB.
Here, the values of the lock variables for the respective nodes, included in the multiple entries, are checked, whereby whether a read lock is held or whether a write lock is held may be checked.
The process of checking whether the lock is held on the current node will be described in detail with reference to
Also, in the synchronization method for improving the concurrent read performance of a critical section in distributed shared memory according to an embodiment of the present invention, the distributed-shared-memory management apparatus in a physical node of the multi-node system acquires a lock for a read operation or a write operation in consideration of whether a lock is held on each node at step S120.
Here, when a read lock is held on the current node, the lock variable included in the entry corresponding to the current node, among the multiple entries, is incremented, whereby the lock for the read operation may be acquired.
Here, when a read lock or a write lock is held on one or more nodes, the release of the read lock or the write lock is waited for, after which the lock for a write operation may be acquired by changing the values of the lock variables for the respective nodes, included in the multiple entries, to a write lock acquisition state.
Hereinafter, a process of acquiring a lock for a read operation will be described in detail with reference to
Referring to
When it is determined at step S1005 that the value of the lock variable is a write lock acquisition state (WRITE_ACQUIRED), this indicates that the write lock is held by another process or thread. Accordingly, the release of the write lock is waited for at step S1010, and the value of b[current_node] may be checked again.
Also, when it is determined at step S1005 that the value of the lock variable is not a write lock acquisition state (WRITE_ACQUIRED), this indicates that the lock is not held by any process or thread or that another process or thread holds a read lock, so the read lock may be acquired. Accordingly, the entry value b[current_node] corresponding to the current node is incremented by 1 at step S1020, and the process of acquiring the read lock may be terminated.
Hereinafter, a process of acquiring a lock for a write operation will be described in detail with reference to
Referring to
Here, the last physical node ID of the current system may be the ID of the last entry included in the read-write lock. For example, referring to
When it is determined at step S1215 that the value of variable i is greater than the last physical node ID, this indicates that the process of updating all of the lock variables for the respective nodes, included in the read-write lock, finishes, so the process of acquiring the write lock may be terminated.
Also, when it is determined at step S1215 that the value of variable i is not greater than the last physical node ID, whether the i-th entry value in the read-write lock is 0 may be determined at step S1225.
That is, whether another process or thread holds a read lock or a write lock on the i-th physical node may be checked.
When it is determined at step S1225 that the i-th entry value is not 0, the i-th entry value may be checked again after waiting until the i-th entry value becomes 0 such that the read lock or write lock is released at step S1230.
Also, when it is determined at step S1225 that the i-th entry value is 0, this indicates that the lock is not held. Accordingly, the i-th entry value is atomically set to WRITE_ACQUIRED at step S1240, the value of variable i is incremented by 1 at step S1250, and the process returns to step S1215.
Here, the process is performed for the values of all of the entries included in the read-write lock while incrementing the value of variable i by 1, whereby the write lock may be acquired.
Also, in the synchronization method for improving the concurrent read performance of a critical section in distributed shared memory according to an embodiment of the present invention, the distributed-shared-memory management apparatus in a physical node of the multi-node system releases the lock based on the lock variables for the respective nodes at step S130 when a read operation or a write operation is terminated.
Here, the value of the lock variable included in the entry corresponding to the current node, among the multiple entries, is decreased, whereby the lock acquired for the read operation may be released.
Here, the lock variables for the respective nodes, included in the multiple entries, are initialized, whereby the lock acquired for the write operation may be released.
Hereinafter, a process by which a lock acquired for a read operation is released will be described in detail with reference to
Referring to
Here, referring to
Here, because the entries forming the read-write lock are aligned according to the minimum memory management unit size, false sharing with other nodes is prevented, whereby memory bouncing between the physical nodes does not occur.
Hereinafter, the process by which a lock acquired for a write operation is released will be described in detail with reference to
Referring to
Here, the last physical node ID of the current system may be the ID of the last entry included in the read-write lock. For example, referring to
When it is determined at step S1315 that the value of variable i is greater than the last physical node ID, this indicates that the process of updating all of the lock variables for the respective nodes, included in the read-write lock, finishes, so the process of releasing the write lock may be terminated.
Also, when it is determined at step S1315 that the value of variable i is not greater than the last physical node ID, the i-th entry value in the read-write lock is atomically initialized to 0 at step S1320, the value of variable i is incremented by 1 at step S1330, and the process is returned to step S1315.
Here, referring to
Here, the order in which the entries included in the array corresponding to the read-write lock are accessed remains constant. Accordingly, when multiple processes simultaneously attempt to acquire a write lock, a deadlock, which may occur when the multiple processes arbitrarily access the write lock, may be prevented.
That is, as illustrated in
Through the above-described synchronization method for improving the concurrent read performance of a critical section in distributed shared memory, when concurrent reads of a critical section are attempted in a distributed shared memory environment including multiple physical nodes, performance degradation caused due to a shared lock variable may be mitigated.
Also, because a lock variable is assigned for each node and is efficiently managed, when a read lock is held, performance degradation caused due to sharing of a lock variable may be minimized.
Also, the degree of parallelism of a system may be improved by improving the concurrent read performance of a critical section, and overall system performance may be improved.
Referring to
The processor 1410 checks whether a lock is held on each node based on a read-write lock having lock variables for respective nodes in a distributed shared memory environment.
Here, the read-write lock may have an array form including multiple entries, the number of which corresponds to the maximum number of physical nodes in a multi-node system.
Here, each of the multiple entries includes each of the lock variables for the respective nodes, and the multiple entries may be aligned so as to correspond to the minimum management unit size of the distributed shared memory environment.
Here, the lock variables for the respective nodes, included in the multiple entries, are checked, whereby a read lock is held or whether a write lock is held may be checked.
Also, the processor 1410 acquires a lock for a read operation or a write operation in consideration of whether a lock is held on each node.
Here, when a read lock is held on the current node, the value of the lock variable included in the entry corresponding to the current node, among the multiple entries, is increased, whereby the lock for the read operation may be acquired.
Here, when a read lock or a write lock is held on one or more nodes, the release of the read lock or the release of the write lock is waited for, after which the lock for the write operation may be acquired by changing the values of the lock variables for the respective nodes, included in the multiple entries, to a write lock acquisition state.
When a read operation or a write operation is terminated, the processor 1410 releases the lock based on the lock variables for the respective nodes.
Here, the value of the lock variable included in the entry corresponding to the current node, among the multiple entries, is decreased, whereby the lock acquired for the read operation may be released.
Here, the values of the lock variables for the respective nodes, included in the multiple entries, are initialized, whereby the lock acquired for the write operation may be released.
Also, the memory 1420 stores the read-write lock.
Using the above-described apparatus for managing distributed shared memory, when concurrent reads of a critical section are attempted in a distributed shared memory environment including multiple physical nodes, performance degradation caused due to a shared lock variable may be mitigated.
Also, because a lock variable is assigned for each node and is efficiently managed, when a read lock is held, performance degradation caused due to sharing of a lock variable may be minimized.
Also, the degree of parallelism of a system may be improved by improving the concurrent read performance of a critical section, and overall system performance may be improved.
According to the present invention, when concurrent reads of a critical section are attempted in a distributed shared memory environment including multiple physical nodes, performance degradation caused due to a shared lock variable may be mitigated.
Also, the present invention assigns a lock variable for each node and efficiently manages the same, thereby providing a new read-write synchronization mechanism capable of minimizing performance degradation caused due to sharing of a lock variable when a read lock is held.
Also, the present invention may improve the degree of parallelism of a system by improving the concurrent read performance of a critical section, and may improve overall system performance.
As described above, the synchronization method for improving the concurrent read performance of a critical section in distributed shared memory and an apparatus for the same according to the present invention are not limitedly applied to the configurations and operations of the above-described embodiments, but all or some of the embodiments may be selectively combined and configured, so the embodiments may be modified in various ways.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0133768 | Oct 2021 | KR | national |
10-2022-0104515 | Aug 2022 | KR | national |