This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-091361, filed on Apr. 28, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to an information processing device, a parallel processing program and a method for accessing shared memory.
The information processing device performing parallel computation includes a function of the exclusive control to maintain the consistency of the data of the shared memory domain where a plurality of threads access.
While as a method of the exclusive control, there is a method that other processors wait by a start of the access processing to a shared memory during an access operation of one thread to a shared memory (below called as a lock method). For example, each thread judges whether or not is able to access the shared memory domain with reference to the variable indicating the exclusion state of the shared memory domain.
On the other hand, there is a method of the exclusive control (below called as HTM method) using the hardware transaction memory (called as HTM) of which the processor of the information processing device includes. The mechanism of HTM guarantees that sequence of instructions (below called as target routine) that a user appointed is carried out as an atomic transaction, for the processing that other threads carry out. When competition of the memory access with other threads occurs during the execution of the target routine, the HTM carries out rollback of the execution of the target routine. For example, the technique about the HTM is listed in following patent documents 1-3.
The user selects a method of the exclusive control to adopt for a program among the lock method and the HTM method at the time of the creation of the program.
[Patent document 1] Japanese National Publication of International Patent Application No. 2013-513888.
[Patent document 2] Japanese National Publication of International Patent Application No. 2013-520753.
[Patent document 3] Japanese Laid-open Patent Publication No. 2012-128628.
However, in the case that the number of threads, which access a shared memory, is single, the processing time of the program based on the exclusive control of the HTM method may become longer than a program based on the exclusive control of the lock method. The number of threads carrying out changes depending on the processing of program. Therefore, at the time of the creation of the program, it is not easy to select a method of the exclusive control to adopt for a program appropriately.
According to an aspect of the embodiments, an information processing device includes a storage unit having a shared memory area, and a processing unit which carries out one or more threads, and
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Hereinafter, embodiments will be described according to figures. But the technical range in the invention are not limited to the embodiments, are extended the subject matters disclosed in claims and its equivalents.
In an information processing device performing parallel computation, when a plurality of threads access a common resource at the same time, inconsistency of the common resource may occur. The exclusive control means to control to inhibit that the plurality of threads access the common resource at the same time. It is possible to avoid that the inconsistency of the common resource occurs by performing the exclusive control.
The thread indicates the smallest execution unit which works program on an operation system. An information processing device according to the embodiment is a processing device realizing multi-thread processing to carry out the plurality of threads at the same time. The common resource according to the embodiment is a shared memory domain that the plurality of threads is accessible and is a domain of some or all in which the shared memory has.
Firstly, according to
[Lock Method]
In addition, a critical section depicted in
The lock method is a method to realize the exclusive control by waiting a start of the access processing to the shared memory domain by other threads during the access processing to the shared memory domain by one thread. The lock method, for example, is a lock method based on spin lock method, a Mutex method and semaphore method. The embodiment exemplifies a case to use the spin lock method based on the lock variable on the memory.
According to the lock method, each thread “th” acquires a lock at the start time of the access processing for the same shared memory domain, namely the start time of the critical section. When the lock variable indicating the variable on the memory indicates a non-lock state, it is possible to acquire the lock. Therefore, each thread “th” changes the value of the lock variable in a lock state from a non-lock state and acquires the lock.
On the other hand, it is not possible that each thread “th” acquires the lock when the lock variable indicates the lock state. When the lock variable indicates the lock state, it is indicated that the other thread updated the lock variable to the lock state and that the lock is acquiring by other threads. Therefore, each thread “th” waits by the acquisition of the lock until the lock variable is updated in a non-lock state by other threads and the lock is released.
Each thread “th” starts the critical section when acquiring the lock. And when each thread “th” finishes the critical section, the thread updates the lock variable in the non-lock state from the lock state, thereby releases the lock.
According to
On the other hand, the other thread “thB” is going to acquire the lock at the timing t3 after the critical section start by the one thread “thA”. But, the other thread “thB” waits by the release of the lock by one thread “thA” because the one thread “thA” is already acquiring the lock. And when the one thread “thA” releases the lock at the timing t2, the other thread “thB” acquires the lock and starts the critical section. The other thread “thB” releases the lock when the other thread “thB” finished the critical section.
As depicted by
In addition, the one thread “thA” and the other thread “thB” may be threads created based on the execution of the same program and may be threads created based on the executions of the different programs each. In addition, the processing of the critical section of the one thread “thA” and the processing of the critical section of the other thread “thB” may be same processing and may be different processing.
Then, according to
HTM method will be described.
[HTM Method]
The HTM method is a method using the mechanism of HTM of the hardware in which the CPU (Central Processing Unit) in the information processing device equips with. The HTM method, when a write by other threads for the shared memory domain occurs during the access processing to the shared memory domain by one thread, is a method to realize exclusive control by canceling the access processing by one thread.
The HTM is a mechanism to support parallel programming. The HTM reduces a collision by the exclusion at the time of the execution of the parallel programming, thereby improves performance. For example, the CPU's, such as Rock by Sun Microsystems (registered trademark), Blue Gene/Q Compute chip of IBM (registered trademark), Core i7 of the Haswell micro architecture by Intel (registered trademark), are equipped with mechanism of HTM.
The HTM carries out the sequence of instructions that a user appointed as an atomic and isolated transaction. The HTM guarantees that the processing that the sequence of instructions appointed as the atomic transaction (as follows called as target routine) is carried out as single transaction for other processing that other threads executes in parallel. The user adds a start instruction and an end instruction of the HTM before and after the object routine which is carried out as the atomic transaction at the time of the creation of the program.
When other threads carries out the write processing at an address of the memory of which the target routine targets for the access processing from the start instruction to the end instruction, the HTM detects the conflict (competition of the memory access). The HTM, when detecting the conflict, carries out an abort of the target routine and performs rollback of the target routine. On the other hand, the HTM, when not detecting the conflict, continues the target routine and completes the target routine. In this way, according to the HTM method, each thread “th” carries out a target routine speculatively for running the parallel processing.
Especially the HTM carries out a pre-processing in response to the execution of the start instruction. The pre-processing means storage (save) processing of an internal state (register information) in a processor core and read processing of the data in the memory area that the target routine targets for the access processing (reading, writing) and a storage processing of read data into the temporary domain.
And, according to the HTM method, the thread “th” carries out the write processing by the target routine for the temporary domain (for example, L1 (level 1) cache) which stored by the preprocessing. In other words, the thread “th” waits the reflection of result of the processing of the target routine to the memory until the end instruction of the HTM is executed. In addition, the HTM detects the conflict when other threads write data in an address of the memory of which the target routine targets for the access processing, during period from the start instruction to the end instruction.
The HTM carries out the abort (interruption) of the transaction when the HTM detects the conflict. Especially the HTM stops the processing of the target routine and returns internal state (resister information) of the CPU except EAX register, to the state at the run time of the start instruction (called as rollback). In addition, the HTM deletes result data of the write processing that is stored in the temporary domain. The EAX register maintains the information indicating the reason of aborting it. And the HTM transits the execution of the program into the abort routine which is appointed by the start instruction. For example, the abort routine performs the instruction of the rerun of the target routine based on the value in the EAX register.
On the other hand, the HTM carries out post-processing at the run time of the end instruction of the target routine, when the HTM does not detect the conflict from the start instruction to the end instruction. The post-processing indicates a write processing to write the result data of write processing which is maintained in the temporary domain into the memory.
The one thread “thA” executes the start instruction of the HTM at a timing t1 to start the critical section. As described above, on the run time of the critical section, the one thread “thA” carries out the processing of the critical section for the data, which is read from the shared memory domain and memorized in the temporary domain (local area) at the time of the execution of the start instruction. Therefore, the one thread “thA” does not directly update the shared memory domain during the execution of the critical section.
On the other hand, another thread “thB” executes the start instruction at a timing t3 after the execution of the start instruction by one thread “thA”. Another thread “thB”, as like as the one thread “thA”, carry out the processing of the critical section for the data, which is read from the shared memory domain and memorized in the temporary domain at the time of the execution of the start instruction.
In the example of
Therefore, the HTM does not detect the conflict at the time of the execution of the end instruction of one thread “thA” (at the time of the write of the result data to the shared memory domain by one thread “thA”) depicted by a timing t2. Therefore, the HTM does not abort the processing of critical section of another thread “thB”. In addition, the HTM lets the processing of critical section of one thread “thA” make a decision (completion).
And another thread “thB” executes the end instruction of the HTM at a timing t4 when another thread “thB” finishes the critical section. The HTM writes the result data which is updated by the processing of critical section of another thread “thB” into the shared memory domain.
As depicted by
According to the example of
Therefore, the HTM detects the conflict at the time of the execution of the end instruction of one thread “thA” (at the time of the write of the result data to the shared memory domain by one thread “thA”) depicted by a timing t2, and aborts the processing of critical section of another thread “thB”. And the HTM performs rollback of the processing of the critical section of another thread “thB”. In other words, the HTM cancels the processing of the critical section of another thread “thB”.
In addition, when the conflict occurs, another thread “thB” carries out the processing of the critical section again. Another thread “thB”, as same as the processing of the critical section, executes the start instruction of the HTM and starts the critical section. And when conflict does not occur, another thread “thB” finishes the critical section, and executes the end instruction of the HTM at the time of the end.
In this way, when the write by one thread “thA” for the shared memory domain occurs during the access processing to the shared memory domain by another thread “thB”, the HTM cancels the access processing to the shared memory domain by another thread “thB”. Therefore, it is possible to avoid that the memory access processing occurs at the same time for the same shared memory domain and to avoid the inconsistency of the data which is stored in the shared memory domain.
As depicted by
[Performance by the Method of the Exclusive Control]
Then, according to
The closer the value on the vertical axis is to the value “1”, it is indicated that the processing time of the program is controlled shortly, namely, the performance is high.
Each of the marks (circle, square, triangle, diamond) illustrated in the graph corresponds with test pattern of the memory access. In addition, each mark illustrated with white color indicates performance of the memory access processing based on the exclusive control of the lock method, and each mark illustrated with black color indicates performance of the memory access processing based on the exclusive control of the HTM method.
According to the graph in
As explained by
In addition, as represented by
According to the graph in
As mentioned by
As depicted by
As depicted by
Therefore, the information processing device according to the embodiment judges whether or not a plurality of threads “th”, which access the shared memory domain “Sm”, are carried out when the thread “th” executes an access processing to access the shared memory domain “Sm”. And the information processing device carries out the access processing to the shared memory domain “Sm” based on the first method (lock method) when judging that single thread “th” is carried out. In addition, the information processing device carries out the access processing to the shared memory domain “Sm” based on the second control (HTM method) when judging that the plurality of threads “th” are carried out.
As described by
In other words, as depicted by
Therefore, it is possible that the information processing device, based on a running condition of the thread “th” which access the same shared memory domain “Sm”, selects and changes a method of the exclusive control of the higher performance during the execution of the program. Therefore, it is possible that the information processing device carries out the access processing to the shared memory domain “Sm” by each thread “th” effectively while maintaining consistency of the shared memory domain “Sm”. In other words, it is possible that the information processing device advances performance of the exclusive control of the access processing to the shared memory domain “Sm”.
[Hardware Constitution of Information Processing Device]
The CPU 101 is connected to the memory 102, etc. through the bus 106 and controls the whole of information processing device 100. In addition, the CPU 101 has a plurality of processor cores, which is not illustrated in
The RAM 120 in the memory 102 memorizes the data which the CPU 101 processes. In addition, for example, the RAM 120 has shared memory domain (shared memory area) “Sm”. But, not a thing limited to this example, the nonvolatile memory 121 may have the shared memory domain “Sm”.
The nonvolatile memory 121 in the memory 102 includes operation system storage domain 131 and application program storage domain 132. For example, the nonvolatile memory 121 indicates nonvolatile semiconductor memory.
The operation system (following, called as operation system 131) in the operation system storage domain 131 realizes the processing of operation system working with the information processing device 100 by the execution of the CPU 101. In addition, the operation system storage domain 131 has exclusive control program storage domain 133. The exclusive control program (following, called as exclusive control program 133) in the exclusive control program storage domain 133 realizes exclusive control processing of the shared memory domain “Sm”. The processing of exclusive control program 133 will be mentioned later according to
The application program (following, called as application program 132) in the application program storage domain 132 works on the operation system 131 by the execution of the CPU 101 and realizes predetermined processing. In addition, the application program 132 calls the exclusive control program 133 when the application accesses to the shared memory domain “Sm”.
[Software Block of Information Processing Device]
The exclusion acquisition module 141 has an exclusion acquisition module 142 of the HTM method and an exclusion acquisition module 143 of the lock method. In addition, the exclusion release module 151 has an exclusion release module 152 of the HTM method and an exclusion release module 153 of the lock method.
The exclusion acquisition module 141 refers to the number of the simultaneous running threads storage area 170 in the memory such as the RAM 120 and acquires the number of the threads carrying out accessing to the same shared memory domain “Sm”. And the exclusion acquisition module 141 calls one of the exclusion acquisition module 142 of the HTM method or the exclusion acquisition module 143 of the lock method based on the number of the threads which is acquired.
The exclusion acquisition module 142 of the HTM method performs start processing of the exclusive control based on the HTM method. Especially the exclusion acquisition module 142 of the HTM method calls the start instruction which notifies HTM 200 of a start of the transaction (target routine) that the HTM 200 (referring to
The exclusion acquisition module 143 of the lock method performs start (acquisition) processing of exclusive control based on the lock method according to the lock variable 160 on the memory such as RAM 120. Especially the exclusion acquisition module 143 of the lock method waits by the start of the critical section until the lock variable 160 changes in a non-lock state. Then the exclusion acquisition module 143 of the lock method updates the lock variable 160 in a lock state for another thread when the lock variable 160 changes in a non-lock state by one thread.
The exclusion release module 151 refers to the number of the simultaneous running threads storage area 170 and acquires the number of the threads carrying out to access the same shared memory domain “Sm” like the exclusion acquisition module 141. And the exclusion release module 151 calls one of exclusion release module 152 of the HTM method or exclusion release module 153 of the lock method based on the number of the threads which is acquired.
The exclusion release module 152 of the HTM method performs end processing of the exclusive control based on the HTM method. Especially the exclusion release module 152 of the HTM method calls an end instruction which notifies the HTM 200 of the end of the transaction that the HTM 200 to be processed. In addition, the exclusion release module 153 of the lock method performs end (release) processing of exclusive control based on the lock method. Especially the exclusion release module 153 of the lock method updates the lock variable 160 in a non-lock state.
[The Number of the Threads]
The information processing device 100 performing the parallel computation carries out thread scheduler 180, for example. The thread scheduler 180 is a process of the operation system 131 which performs the schedule for the thread “th”. The thread scheduler 180 selects the thread of which the execution is started and assigns it to a processor core (not illustrated in
For example, each thread “th” refers to the number of the simultaneous running threads storage area 170 and acquires the number of the threads carrying out the execution to access the same shared memory domain “Sm” at the same time (sign “p1” in
In addition, the method, in which the thread “th” acquires the number of the running threads which accesses the same shared memory domain “Sm”, is not a thing limited to an example of
Then, according to
[Processing of Exclusive Control Program 133]
S11: The application program 132 calls the exclusion acquisition module 141 in the exclusive control program 133 before the execution start of the critical section.
S12: The exclusion acquisition module 141 refers to the number of the simultaneous running threads storage area 170 which is explained in
S13: The exclusion acquisition module 141, when the number of the simultaneous running threads is more than two (Yes of S12), calls the exclusion acquisition module 142 of the HTM method. The exclusion acquisition module 142 of the HTM method executes the execution start instruction of the HTM method and carries out the pre-process of the HTM method. The details of the processing in the process S13 will be mentioned later in a flow chart of
S14: On the other hand, when the number of the simultaneous running threads is single (No in S12), the exclusion acquisition module 141 calls the exclusion acquisition module 143 of the lock method. The exclusion acquisition module 143 of the lock method acquires the lock based on the lock variable 160. The details of the processing in the process S14 will be mentioned later in a flow chart of
S15: When the exclusion acquisition processing (process S13 or process S14) is finished, the exclusion acquisition module 141 returns control to the application program 132. And the thread carries out the access processing (critical section) to the shared memory domain “Sm” which is processing of the application program 132.
In addition, in a case of selecting the exclusive control of the HTM method, when the HTM 200 detects the conflict (competition of the memory access) during the execution of the critical section, the HTM 200 aborts the critical section and performs the rollback. For example, the thread “th” executes the execution start instruction of the HTM method again, when the thread “th” carries out the processing of critical section again.
S16: When the critical section is finished, the application program 132 calls the exclusion release module 151 in the exclusive control program 133.
S17: The exclusion release module 151 judges which the exclusion acquisition processing (S13, S14) is based on the HTM method or the lock method.
S18: When the exclusion acquisition processing is based on the HTM method (described as HTM method in
S19: When the exclusion acquisition processing is based on the lock method (described as lock method in
The details of the processing in the process S19 will be mentioned later in a flow chart of
As depicted by
Then, change of the exclusive control method, when a method of the exclusive control is selected according to the flow chart in
[Change of Exclusive Control]
The application program 132 starts a run of the thread “thA” at a timing t11. Due to a run start of the thread “thA”, the thread scheduler 180 updates a value in the number of the simultaneous running threads storage area 170 to “1” from “0”.
The thread “thA” starts the critical section before the thread “thB” starts a run. The thread “thA” calls the exclusion acquisition module 141 (S11 in
On the other hand, the application program 132 starts a run of thread “thB” during a run of thread “thA” (at a timing t12 in
However the thread “thA” is already acquiring the exclusion based on the lock method at the time of a timing t13. The function of the exclusive control does not establish even if the exclusive controls are carried out based on a different exclusive control method for the same shared memory domain “Sm”. In other words, it is necessary that the exclusive control method for the same shared memory domain “Sm” is the same exclusive control method. Therefore, the thread “thB” waits by the exclusion acquisition processing based on the HTM method until thread “thA” releases the exclusion based on the lock method (S19 of
And when the thread “thA” releases the exclusion, at a timing t14 (S19 of
In this way, when a plurality of threads “th” are not carried out, the thread “thA” selects the lock method. However, during the exclusion acquisition of the lock method, there may a case that new thread “thB” starts a run and a value in the number of the simultaneous running threads storage area 170 changes to “2” from “1”. In this case the thread “thB” waits by a start of the access processing (critical section) to shared memory domain “Sm” based on the exclusive control of the HTM method during the access processing to the shared memory domain “Sm” based on the exclusive control of the lock method.
In other words, when the information processing device 100 starts the execution of new thread and changes to a state of carrying out a plurality of threads during that the single thread is carried out, the information processing device 100 waits a start of the access processing based on the HTM method by the new thread, until the access processing based on the lock method finishes. In this way it is possible that the information processing device 100 realizes the exclusive control according to the exclusive control method which is common to the plurality of threads “th” appropriately, even if the number of the threads carrying out accessing the same shared memory domain “Sm” increases from one to multiple pieces during the access processing.
In
In addition, the HTM 200 aborts the critical section of the thread “thB” and performs the rollback when the competition of the memory access occurs between the critical section of the thread “thB” at the run time of the end instruction of the critical section of thread “thA” (×1). When carrying out the critical section again, the thread “thB” acquires the exclusion based on HTM method according to a value in the number of the simultaneous running threads storage area 170 (S13) and carries out the critical section (S15).
And when the thread “thA” stops (finishes) a run at a timing t15, the thread scheduler 180 updates the number of the simultaneous running threads storage area 170 to value “1” from value “2”. In addition, the thread “thB” carries out the processing of exclusion release based on a method (namely, HTM method), which is selected at the time of the exclusion acquisition, at the time of the end of the critical section (a timing t16), even after the number of the simultaneous running threads storage area 170 was updated to value “1” (S18).
In other words, when the information processing device 100 finishes the execution of any one of threads in a case of carrying out the plurality of threads and a state transitions to the state that the single thread is carried out, the information processing device 100 carries out the end (exclusive release) processing based on the HTM method at the end of the access processing. In this way it is possible that the information processing device 100 carries out the processing of the exclusion release based on an exclusive control method at the time of the exclusion acquisition appropriately, even if the number of the threads carrying out accessing the same shared memory domain “Sm” decreases from multiple pieces to single during the access processing.
And the thread “thB” starts the critical section at a timing t17 after a stop of the thread “thA”. Then the thread “thB” selects the lock method according to value “1” in the number of the simultaneous running threads storage area 170 (S12 in
Then, according to
[Performance of the Exclusive Control Method According to Embodiment]
The elements indicated by the horizontal axis, the vertical axis and the marks in the graph of
When the number of threads “th” carrying out accessing the same shared memory domain “Sm” is more than two, the exclusive control method according to the embodiment adopts the exclusive control method of the HTM method. Therefore, according to the graph in
The exclusive control method according to the embodiment adopts the exclusive control method of the lock method when the number of threads “th” carrying out accessing the same shared memory domain “Sm” is single. Therefore, according to the graph in
As illustrated by
Then, according to
[Example of the Program]
The program pr1 carries out the description c1 before the execution start of the critical section (c3, S15 of
The description c11 represented by
The description c13 indicates processing of a case that a value of number of the threads “numThreads” carrying out is bigger than value “1” (Yes of S12 in
The description c14 indicates the processing when the value of number of the threads “numThreads” carrying out is less than a value “1” (No of S12 in
The description c21 represented by
The description c23 indicates an instruction (S18) which calls the exclusion release module 152 (rtm_wrapped_unlock( )) of the HTM method when the method “access_form” of the exclusive control set by the exclusion acquisition module 141 is the HTM method (HTM method of S17 of
Then, flows of the processing of the exclusion acquisition module 142 of the HTM method and the exclusion release module 152 of the HTM method will be described according to
[Processing of HTM Method]
S21: The exclusion acquisition module 142 of the HTM method judges whether or not the lock based on the lock method is released. As illustrated in
S22: When the lock based on the lock method has been released or when the exclusion is released based on the lock method (Yes of S21), the exclusion acquisition module 141 executes a start instruction of the HTM 200 and carries out the pre-processing of the HTM method. The pre-processing of the HTM method is mentioned above in
S31: The exclusion release module 152 of the HTM method executes an end instruction of HTM 200 and performs the post-processing of the HTM method. The post-processing of the HTM method is mentioned above in
[Processing of Lock Method]
S41: The exclusion acquisition module 143 of the lock method judges whether or not the lock based on the lock method is released. The exclusion acquisition module 143 of the lock method judges whether or not the lock is released based on whether or not a value of the lock variable “spinlock” 160 (
S42: When the lock based on the lock method has been released or when the exclusion is released based on the lock method (Yes of S41), the exclusion acquisition module 141 acquires the lock. In other words, the exclusion acquisition module 141 updates a value of the lock variable 160 in the value indicating the lock state from the value indicating the non-lock state.
S51: The exclusion release module 153 of the lock method releases the lock. In other words, the exclusion release module 153 of the lock method updates a value of the lock variable 160 in the value indicating the non-lock state from the value indicating the lock state.
The embodiment mentioned above exemplified the case that the operation system 131 has the exclusive control program 133 according to the embodiment. But the embodiment is not limited to this example. The application program 132 may include the exclusive control program 133 according to the embodiment.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-091361 | Apr 2015 | JP | national |