This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-043370, filed on Mar. 5, 2015, the entire contents of which are incorporated herein by reference.
A certain aspect of embodiments described herein relates to a computer readable medium, a mapping information generating method, and a mapping information generating apparatus.
There has been known a parallel computing system using multiple computers (hereinafter, referred to as nodes) to execute arithmetic processing in parallel as disclosed in, for example, Japanese Patent Application Publication No. 2014-137732 (Patent Document 1). The use of the parallel computing system greatly reduces computation time required for a large-scale numerical analysis.
In recent years, to fulfill the requirement for computational performance, there has been used not only an indirect network parallel computing system that indirectly interconnects nodes through a switch, but also a direct network parallel computing system that directly interconnects nodes. A fat tree network has been known as the example of the indirect network, while a torus network and a mesh network have been known as the example of the direct network. The torus network includes a variety of forms. A three dimensional torus that has a cuboid grid structure has been known as one of them (see Patent Document 1).
In the aforementioned direct network parallel computing system, a technique called rank location optimization has been known as disclosed in, for example, Hiroaki IMADE and six others, “Reduction of Execution Time of RMATT for Communication Time Optimization for Large Scale Computation”, High Performance Computing Symposium 2012, Information Processing Society of Japan, January, 2012, p. 93-100 (Non Patent Document 1). This is a technology that assigns (maps) ranks to proper nodes in response to a communication pattern when a Message Passing Interface (MPI) application is executed in the direct network parallel computing system. Here, the MPI application is a parallel program written in MPI. The rank is a number that is given to each process of the MPI application when the MPI application is executed. However, a process given a rank is sometimes called as a rank. When the MPI application is executed based on the locations of the ranks obtained by the rank location optimization, the number of nodes passed through (the number of hops) and the congestion at the time of inter-process communication are reduced, and the communication processing time required for the inter-process communication can be reduced.
Various techniques have been suggested for the aforementioned rank location optimization. For example, there has been suggested a technique that divides a process group including multiple processes into divided process groups based on a result of the division of a network area including multiple nodes, and then places the divided process groups in one of the divided network areas as disclosed in, for example, Japanese Patent Application Publication No. 2012-243224 (Patent Document 2). Moreover, there has been suggested Simulated Annealing (SA) that measures communication load and randomly searches the optimized solution of the locations of the ranks based on the measurement results (see Non Patent Document 1).
According to an aspect of the present invention, there is provided a non-transitory computer readable medium storing a mapping information generation program that causes a computer to execute a process, the process including: placing a plurality of processes in a space generated by a computer; changing positions of the plurality of processes by applying at least one of an attracting force and a repulsive force between each two processes included in the plurality of processes; and generating information that maps the plurality of processes to a plurality of processors based on changed positions of the plurality of processes and positions of the plurality of processors.
When the Simulated Annealing disclosed in Non Patent Document 1 is employed in a large scale parallel computing system, the processing quantity of search explosively increases because the search is randomly performed, and the calculation amount required to obtain the optimized solutions of the locations of the ranks thereby increases. That is to say, there is a problem that the large amount of time is required to obtain the optimized solution of the locations of the ranks.
Hereinafter, a description will be given of an embodiment with reference to accompanying drawings.
The topology of the multiple nodes 110 included in the computing node group 100 is a three-dimensional torus. Thus, a line of the multiple nodes 110 placed on the X axis is connected in a ring shape, and a line of the multiple nodes 110 placed on the Y axis is also connected in a ring shape. In the same manner, a line of the multiple nodes 110 placed on the Z axis is connected in a ring shape.
The mapping information generating apparatus 300 is coupled to the computing node group 100 through a network NW1. The examples of the network NW1 include, for example, a Local Area Network (LAN). The mapping information generating apparatus 300 generates mapping information that defines which node 110 a process given a rank (hereinafter, referred to as a rank as appropriate) is to be mapped to. The mapping information may be called, for example, a rank map file, or a rank location file. The mapping information generating apparatus 300 maps the ranks to the multiple nodes 110 on a one-to-one basis based on the generated mapping information. This reduces the number of nodes passed through (the number of hops) and the congestion at the time of inter-process communication, thereby reducing the communication processing time required for the inter-process communication. At least one of the multiple nodes 110 included in the computing node group 100 may execute the function of the mapping information generating apparatus 300.
A terminal device 400 is coupled to the mapping information generating apparatus 300 through a network NW2. The examples of the network NW2 include, for example, the Internet. The terminal device 400 may be, for example, a Personal Computer (PC), a tablet terminal, or a handheld terminal. The user operates the terminal device 400 to transmit, in addition to the aforementioned network topology, at least a communication pattern described later to the mapping information generating apparatus 300. An initial state described later is also transmitted when the transmission of the initial state is requested. The mapping information generating apparatus 300 generates the mapping information based on at least the network topology and the communication pattern.
A description will next be given of a hardware configuration of the aforementioned node 110 with reference to
The ICC 112 and the main memory 113 are coupled to the CPU 111. The ICC 112 has multiple ports, and is coupled to the ICC 112 of each of the adjacent nodes 110 through the corresponding port. For example, when the ICC 112 has six ports, the ICC 112 is coupled to the ICC 112 of the adjacent node 110 through a first port in the +X axis direction, and is coupled to the ICC 112 of the adjacent node 110 through a second port in the −X axis direction. In the same manner, the ICC 112 is coupled to the ICC 112 of the adjacent node 110 through a third port in the +Y axis direction, and is coupled to the ICC 112 of the adjacent node 110 through a fourth port in the −Y axis direction. The ICC 112 is coupled to the ICC 112 of the adjacent node 110 through a fifth port in the +Z axis direction, and is coupled to the ICC 112 of the adjacent node 110 through a sixth port in the −Z axis direction. Each node 110 to which a rank is assigned executes the process while communicating with other nodes 110.
A description will next be given of a hardware configuration of the aforementioned mapping information generating apparatus 300 with reference to
An input device 710 is coupled to the input I/F 300F. The examples of the input device 710 include, for example, a keyboard, and a mouse.
A display device 720 is coupled to the output I/F 300G. The examples of the display device 720 include, for example, a liquid crystal display.
A semiconductor memory 730 is coupled to the input output I/F 300H. The examples of the semiconductor memory 730 include, for example, a Universal Serial Bus (USB) memory, and a flash memory. The input output I/F 300H reads programs and data stored in the semiconductor memory 730.
The input I/F 300F and the input output I/F 300H include, for example, a USB port. The output I/F 300G includes, for example, a display port.
A portable recording medium 740 is inserted into the drive device 300I. The examples of the portable recording medium 740 include, for example, a removable disk such as a Compact Disc (CD)-ROM and a Digital Versatile Disc (DVD). The drive device 300I reads programs and data stored in the portable recording medium 740.
The network I/F 300D includes, for example, a port and a Physical Layer Chip (PHY chip). The mapping information generating apparatus 300 is coupled to the networks NW1, NW2 through the network I/F 300D.
The CPU 300A causes the aforementioned RAM 300B to store the programs stored in the ROM 300C and the HDD 300E. The CPU 300A causes the RAM 300B to store the programs stored in the portable recording medium 740. The execution of the stored programs by the CPU 300A implements the various functions described later, and implements the various operations. The programs are configured to correspond to flowcharts described later.
A description will next be given of the specifics of the mapping information generating apparatus 300 with reference to
The mapping information generating apparatus 300 includes, as illustrated in
The reception unit 301 receives the initial state, the network topology, and the communication pattern from the terminal device 400. The reception unit 301 transmits the initial state, the network topology, and the communication pattern that have been received to the rank location change unit 302. The communication pattern includes, as illustrated in
For example, when the computing node group 100 executes the MPI application AP illustrated in
The computing node group 100 generates the aforementioned initial state based on the communication pattern obtained as described above. More specifically, the computing node group 100 divides the processes into multiple groups each including ranks that frequently communicate with each other based on the communication amount between the ranks included in the communication pattern and the network topology. For example, as illustrated in
As illustrated in
The rank location change unit 302 receives information on the initial state, the network topology, and the communication pattern transmitted from the reception unit 301. When the rank location change unit 302 does not receive the information on the initial state, it determines that the computing node group 100 did not generate the initial state, and generates the initial state based on the information on the communication pattern. The rank location change unit 302 conforms the aspect ratio of the system in molecular dynamics (MD) to the aspect ratio of the received network topology after the reception. Therefore, when the network topology is 16×8×8, the aspect ratio of the system becomes 16×8×8. The present embodiment uses the concept of molecular dynamics as described above. This aims to little change the positions of the ranks from the simulation result as much as possible in the process of aligning the locations of the ranks described later. The rank in the present embodiment corresponds to the atom in molecular dynamics.
The rank location change unit 302 calculates an attracting force corresponding to communication traffic between the ranks and the distance between the ranks, or a repulsive force corresponding to the distance between the ranks based on the distance between the ranks obtained from the initial state, and the communication amount and the number of communication included in the communication pattern. The communication traffic may be called communication load. Although the details will be described later, depending on the distance between the ranks, an attracting force or a repulsive force is generated between the ranks. The rank location change unit 302 calculates the attracting force or the repulsive force, and then changes the locations of the ranks representing the position of each rank by applying at least one of the attracting force and the repulsive force between the ranks. The rank location change unit 302 transmits the changed locations of the ranks to the mapping information generating unit 303. The specifics of the rank location change unit 302 will be described later.
The mapping information generating unit 303 generates the mapping information by assigning the ranks of which the locations have been changed to the nodes 110 depending on the network topology while keeping the changed locations of the ranks transmitted from the rank location change unit 302. The changed locations of the ranks that have been transmitted do not correspond to the nodes 110. Thus, the mapping information generating unit 303 moves the changed locations of the ranks to the positions of the nodes 110 to associate the ranks to the nodes 110 on a one-to-one basis. Hereinafter, although the details will be described later, the process that moves the changed location of the rank to the position of the node 110 is called an alignment process. The mapping information generating unit 303 transmits the aligned locations of the ranks by the alignment process, i.e., the generated mapping information, to the mapping information evaluation unit 304.
The mapping information evaluation unit 304 receives the mapping information transmitted from the mapping information generating unit 303. The mapping information evaluation unit 304 evaluates the received mapping information by using predetermined evaluation formulas described later. The mapping information evaluation unit 304 determines that the positive evaluation result is obtained when the improved evaluation value compared to the evaluation value obtained last time is obtained, and outputs the mapping information to the mapping information storing unit 305. At this time, the mapping information evaluation unit 304 may output the improved evaluation value as the positive evaluation result together with the mapping information. On the other hand, the mapping information evaluation unit 304 determines that the negative evaluation result is obtained when the improved evaluation value compared to the evaluation value obtained last time is not obtained, and outputs the acquisition of the negative evaluation result to the mapping information generating unit 303. Thus, the mapping information generating unit 303 transmits the changed locations of the ranks that have been kept, i.e., the locations of the ranks before the alignment process, to the rank location change unit 302. The rank location change unit 302 changes the locations of the ranks again when receiving the locations of the ranks before the alignment process. The repetition of the above-described process by the rank location change unit 302 enables to finally obtain the more improved mapping information.
A description will next be given of the operation of the mapping information generating apparatus 300 with reference to
The reception unit 301 receives the initial state, the network topology, and the communication pattern transmitted from the terminal device 400 (step S101). When the rank location change unit 302 determines that it does not receive the initial state (step S101A: YES), it generates the initial state (step S101B). Thus, the rank location change unit 302 places multiple ranks in a space constructed on a computer.
When the rank location change unit 302 ends the process of step S101B, or determines that it receives the initial state (step S101A: NO), it calculates an attracting force with a magnitude corresponding to the communication traffic between the ranks and the distance between the ranks, a repulsive force with a magnitude corresponding to the distance between the ranks, and a resultant force obtained by combining the attracting force and the repulsive force (step S102). More specifically, the rank location change unit 302 calculates communication traffic Ci,j of the communication between a rank i and a rank j based on the communication amount and the number of communication between the rank i and the rank j included in the communication pattern, and the following formula (1). The value “20000” included in the formula (1) is a constant, and the constant may be changed as appropriate. The following formula (1) defines a larger one of the value “1” and the result of the multiplication of the value “20000”, the communication amount, and the number of communication as the communication traffic Ci,j. If the result of the multiplication is simply defined as the communication traffic Ci,j and the communication does not occur, the number of communication becomes zero, and the value of the result of the multiplication also becomes zero. Accordingly, the value of the communication traffic Ci,j becomes zero, and an attracting force fi,j described later is not generated. To avoid such a situation that the attracting force fi,j is not generated when the communication does not occur, the formula (1) defines the larger one of the result of the multiplication and the value “1” as the communication traffic Ci,j so that the attracting force is certainly generated.
C
i,j=MAX(20000×COMMUNICATION AMOUNT ×NUMBER OF COMMUNICATION, 1) (1)
The rank location change unit 302 then calculates the attracting force fi,j acting between the rank i and the rank j by using the following formula (2) when the distance |ri-rj| between the rank i and the rank j is greater than a threshold value L2 that is a predetermined reference value. According to the formula (2), as the amount of the communication traffic Ci,j increases, the attracting force fi,j acting between the rank i and the rank j increases. As a result, the ranks between which the amount of the communication traffic Ci,j is large are placed near each other. According to the formula (2), as the distance between the rank i and the rank j increases, the attracting force fi,j acting between the rank i and the rank j increases. That is to say, as the amount of the communication traffic Ci,j increases, and as the distance between the rank i and the rank j increases, the attracting force fi,j with a larger magnitude is generated. For example, in molecular dynamics, by the effect of van der Waals force, the atoms strongly repel each other when the atoms come close to each other, while the atoms attract one another with small force when the atoms are distanced from each other. The present embodiment does not use van der Waals force itself, and applies the force different from van der Waals force between the rank i and the rank j.
On the other hand, the rank location change unit 302 calculates a repulsive force fi,j acting between the rank i and the rank j by using the following formula (3) when the distance |ri-rj| between the rank i and the rank j is less than the threshold value L2 and is greater than a predetermined threshold value L1. According to the formula (3), as the distance between the rank i and the rank j decreases, the repulsive force with a larger magnitude is generated. The value “−−600” included in the formula (3) is a constant, and the constant may be changed as appropriate.
On the other hand, the rank location change unit 302 calculates the repulsive force fi,j acting between the rank i and the rank j by using the following formula (4) when the distance |ri-rj| between the rank i and the rank j is less than the threshold value L1. According to the formula (4), as the distance between the rank i and the rank j further decreases, the repulsive force with a magnitude greater than that of the repulsive force obtained by the formula (3) is generated. The value “−50000” included in the formula (4) is a constant, and the constant may be changed as appropriate.
Thus, the relationship between the attracting force corresponding to the communication traffic between the ranks and the distance between the ranks and the repulsive force corresponding to the distance between the ranks is represented by the graph illustrated in
The rank location change unit 302 calculates a resultant force Fj finally acting on the rank j by using the attracting force or the repulsive force calculated as described above and the following formula (5). According to the formula (5), as illustrated in
The rank location change unit 302 then applies the calculated resultant force Fj to each rank j, and changes the location of each rank j (step S103). As a result, the ranks, which concentrate on one point in each group in the initial state as illustrated in
When the rank location change unit 302 completes changing the locations of the ranks, the mapping information generating unit 303 then executes the alignment process on the changed locations of the ranks to generate the mapping information (step S104).
Here, with reference to
The mapping information generating unit 303 sets an initial radius R0 as a radius R of a circle 10 centered at a grid center c of the grid G illustrated in
The mapping information generating unit 303 then moves a rank, which is located closest to one of grid points that are located outside the circle 10 with the radius R and in which no rank is placed (grid points at position (xg, yg)), to the closest grid point of the above grid points (step S202). More specifically, as illustrated in FIG. 12A, the mapping information generating unit 303 specifies the grid points g1, g2, g3, g4 that are located outside the circle 10 with the radius R and in which no rank is placed. The mapping information generating unit 303 then specifies the rank r1 located closest to the grid point g1, the rank r2 located closest to the grid point g2, the rank r3 located closest to the grid point g3, and the rank r4 located closest to the grid point g4. Finally, as illustrated in
The mapping information generating unit 303 then moves a rank located outside the circle 10 with the radius R to a grid point (a grid point at position (xn, yn)) to which the distance from the rank is shortest and in which no rank is placed (step S203). More specifically, as illustrated in
When the process of step S203 is completed, the mapping information generating unit 303 sets a new radius R smaller than the present radius R by ΔR (step S204), and determines whether the new radius R is zero (step S205). When the mapping information generating unit 303 determines that the new radius R is not zero (step S205: NO), the aforementioned processes of steps S202 and S203 are repeated. This allows the mapping information generating unit 303 to map ranks to the grid points in order of being away from the grid center c in a concentric fashion. When the mapping information generating unit 303 determines that the new radius R is zero (step S205: YES), the mapping information generating unit 303 ends the alignment process. The mapping information generating unit 303 transmits the locations of the ranks after the alignment process to the mapping information evaluation unit 304 as the mapping information.
Back to
When the process of step S104 is completed, the mapping information evaluation unit 304 calculates the evaluation value E of the mapping information with a predetermined evaluation formula (step S105). The predetermined evaluation formula is represented by the following formula (6).
Here, hopi,j in the formula (6) represents the number of communication hops between the rank i and the rank j. Sizei,j in the formula (6) represents the communication amount between the rank i and the rank j. That is to say, the evaluation value E in the formula (6) represents the sum of the values calculated by multiplying the number of communication hops and the communication amount of all the combination of the rank i and the rank j. According to the formula (6), when the ranks between which the large amount of communication is performed are located so that the number of communication hops between them is small, the evaluation value E is small.
After calculating the evaluation value E, the mapping information evaluation unit 304 determines whether the evaluation value E is improved (step S106). More specifically, when the evaluation value E′ that has been already calculated is stored in the mapping information storing unit 305, the mapping information evaluation unit 304 reads the evaluation value E′ from the mapping information storing unit 305. The mapping information evaluation unit 304 then compares the evaluation value E′ that has been read out with the evaluation value E most recently calculated. When the mapping information evaluation unit 304 determines that the evaluation value E′ is greater than the evaluation value E, it determines that the evaluation value E is improved (step S106: YES), and outputs the mapping information to the mapping information storing unit 305 (step S107). At that time, the mapping information evaluation unit 304 may output the evaluation value E to the mapping information storing unit 305 together with the mapping information.
When the process of step S107 is completed, the mapping information evaluation unit 304 determines whether the evaluation value E is less than an evaluation threshold value (step S108). The evaluation threshold value is a threshold value used to determine whether the mapping information is sufficiently optimized. When the mapping information evaluation unit 304 determines that the evaluation value E is less than the evaluation threshold value (step S108: YES), it ends the process.
On the other hand, when the mapping information evaluation unit 304 determines that the evaluation value E is not improved at step S106 (step S106: NO), or determines that the evaluation value E is not less than the evaluation threshold value (step S108: NO), it determines whether a time step ts has reached the upper limit T (step S109). Here, the time step ts represents the number of times that the mapping information is generated. The upper limit T may be determined in advance.
When the mapping information evaluation unit 304 determines that the time step ts has reached the upper limit T (step S109: YES), it ends the process. On the other hand, when the mapping information evaluation unit 304 determines that the time step ts has not reached the upper limit T (step S109: NO), it repeats the processes from step S102 to step S108. Thus, the mapping information evaluation unit 304 generates and evaluates the mapping information repeatedly till the time step ts reaches the upper limit T (e.g., 4000 time steps). In the process, when the mapping information evaluation unit 304 calculates the evaluation value E less than the evaluation threshold value, it stores the mapping information that has been used to calculate the evaluation value E in the mapping information storing unit 305. In contrast, when the mapping information evaluation unit 304 calculates the evaluation value E greater than or equal to the evaluation threshold value, it causes the time step ts to increase, and the mapping information generating unit 303 generates new mapping information. Then, the mapping information evaluation unit 304 evaluates the new mapping information.
With reference to
Here, a relationship between the coordinates of each rank before move and the coordinates after the move are represented by the following formulas (7) through (12). In the formulas (7) through (9), k is a constant that represents a travel distance preliminarily set. Thus, the travel distances of the first rank, the second rank, the third rank, and the fourth rank in
x
n
n+1
=x
n
n
+kΔx
j (7)
y
j
n+1
=y
j
n
+kΔy
j (8)
z
j
n+1
=z
j
n
+kΔz
j (9)
Δxj=Fx,j/|{right arrow over (F)}| (10)
Δyj=Fy,j/|{right arrow over (F)}| (11)
Δzj=Fz,j/|{right arrow over (F)}| (12)
A description will be given of the mapping information with reference to
As described above, the mapping information generating apparatus 300 in accordance with the present embodiment places multiple ranks in a space constructed on a computer, and changes the positions of the multiple ranks by applying at least one of an attracting force and a repulsive force between each two ranks included in the multiple ranks in the space. The mapping information generating apparatus 300 then generates mapping information that maps the multiple ranks to the multiple nodes 110 based on the changed positions of the multiple ranks and obtained positions of the multiple nodes 110. When the Simulated Annealing is employed in such a large scale parallel computing system S, the large quantity of calculation is required to generate the mapping information. However, the use of the mapping information generating apparatus 300 in accordance with the present embodiment can reduce the computing amount required to generate the mapping information even in the large scale parallel computing system S.
A description will be given of the difference in the evaluation value E between when the initial state is present and when the initial state is absent with reference to
As illustrated in
As described above, the evaluation value E calculated with use of the initial state becomes less than the evaluation value E calculated with use of the state different from the initial state. Thus, the use of the initial state enables to obtain mapping information more appropriate than the mapping information generates with use of the state different from the initial state. The evaluation value E calculated based on the state different from the initial state without using the mapping information generating apparatus 300 of the present embodiment converges on E=31608. Thus, even when the initial state is not used, if the state different from the initial state is used and the mapping information generating apparatus 300 of the present embodiment is used, the mapping information can be generated with a small computing amount even in the large scale parallel computing system S.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various change, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention. For example, a history of how the ranks move illustrated in
Number | Date | Country | Kind |
---|---|---|---|
2015-043370 | Mar 2015 | JP | national |