This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2011-063400, filed on Mar. 22, 2011, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are directed to a data transmitting device, a parallel computer system, and a controlling method of the data transmitting device.
In the related art, a parallel computer system that includes plural computation nodes is known. As an example of the parallel computer system, there is a known parallel computer system where each computation node is provided with a router to relay communication between plural computation nodes.
The CPU 61 executes a program that is allocated to the CPU itself and transmits information to be transmitted to the other CPUs 61a to 61e to the NIC 62. The NIC 62 packets the information to be transmitted from the CPU 61 to the other CPUs 61a to 61e and transmits the information to the router 63. The router 63 has an input port that receives packets from the NIC 62 and an output port that transmits the packets to the router 63a, and transmits the packets received from the NIC 62 through the input port to the router 63a through the output port.
As illustrated in portion (A) of
In this case, when each of the routers 63 to 63e each performs adjustment between the ports using a round-robin system, because the parallel computer system 50 may not find the priority between the ports viewed from the entire parallel computer system 50, the parallel computer system 50 attempts to influence the input ports to win fairly. For this reason, it is difficult to equally allocate a band to communication between the computation nodes 60 to 60e by the parallel computer system 50.
Next, a description will be made regarding an example of the case where the packets are transmitted from the CPUs 61 to 61d to the CPU 61e and the packets always join the two receiving ports in the routers 63 to 63d as illustrated in portion (A) to (E) of
For this reason, the parallel computer system 50 can allocate “1/4” of the entire band to the communication between the computation node 60d and the computation node 60e. However, since the routers 63 to 63d each allocate “1/4” of the bands in the output ports to the communication between the computation node 60 and the computation node 60e, the parallel computer system 50 can allocate only about “(1/4)5” of the entire band.
Therefore, there is known a technology for storing a passage time after transmitting packets, or the number of routers relaying the packets as age information in headers of the packets and for performing adjustment on the basis of the age information stored in the headers of the packets.
In this case, in the “age”, a passage time after packet transmission or the number of routers relaying the packets is stored. In the “carry”, carryout bits are stored. In the “epoch”, information that is referenced when the age information is updated is stored. In addition, “unused” of 1 bit is a non-used area.
When the parallel computer system 50 performs adjustment on the basis of the age information stored in the packet, a control parameter of the age information is previously set according to the topology of a network connecting the plural computation nodes included in the parallel computer system 50 or a scale of a job executed by the computation nodes 60 to 60e. For example, “AGE_CLOCK_PERIOD” that illustrates a time interval to add a value stored in “age” is set as a control parameter. In addition, “REQ_AGE_BIAS” and “RSP_AGE_BIAS” that illustrate values of the age to be added when the packet hops the router once are set as control parameters. In addition, “AGE_RR_SELECT” that illustrates a ratio in which the routers 63 to 63e perform the adjustment using the round-robin system and a ratio in which the routers 63 to 63e perform the adjustment using the age information is set as a control parameter.
When the routers 63 to 63e perform the adjustment on the basis of the age information, the routers 63 to 63e update the age information of the packets received on the basis of the set control parameters and compare the updated “age”. The routers 63 to 63e transmit the packets to the next router sequentially from the packets where the values stored in the “age” are large. The parallel computer system 50 that has the routers 63 to 63e preferentially transmits the packets where the passage time after the packet is transmitted by the computation node of the transmission origin or the movement distance is large. Therefore, the wider band is allocated as the distance between the computation nodes performing the communication is longer.
However, according to the technology for performing the adjustment using the age information, the band is distributed on the basis of information indicating latency such as the passage time after the packets are transmitted by the computation node or the number of routers relaying the packets. For this reason, there lies a problem that the routers 63 to 63e do not appropriately distribute the band to the communication between the computation nodes.
That is, a process of adjusting the competition between the input ports is different from a process of adjusting the latency of the packets and is a process of distributing the band of the parallel computer system 50 to the communication between the computation nodes 60 to 60e. For this reason, the band may not be appropriately distributed to the communication between the computation nodes 60 to 60e, using the information of the latency that is different from the information of the band.
If the parallel computer system 50 performs the adjustment using the information indicating the latency when the computation nodes 60 to 60e perform multi-point to multi-point communication, the parallel computer system 50 may not equalize the number of routers relaying the packets. That is, since the parallel computer system 50 may not equalize the deviation of the latency of the packets, the parallel computer system 50 may not appropriately allocate the band to the communication between the computation nodes 60 to 60e.
Since the routers 63 to 63e store the passage time after the packet transmission or the number of routers relaying the packets in the “age”, the bit width of the field that stores the “age” increases. For this reason, there have been problems that the bit width of the field that stores the data decreases and data transmission efficiency may be deteriorated in the communication between the computation nodes.
According to an aspect of an embodiment of the invention, a data transmitting device includes a receiving unit that receives data from a plurality of computation nodes transmitting data each other. The data transmitting device further includes an acquiring unit that acquires a cumulative number of other data being counterparts of adjustment performed by the computation nodes until the data is received by the receiving unit from each received data. The data transmitting device further includes an updating unit that updates the cumulative number acquired from each data by the acquiring unit, on the basis of a number of the data received by the receiving unit. The data transmitting device further includes an adjusting unit that adjusts the data received by the receiving unit on the basis of the cumulative number updated by the updating unit, and selects data to be transmitted to the computation nodes. The data transmitting device further includes a storing unit that stores the cumulative number updated by the updating unit in the data selected by the selecting unit. The data transmitting device further includes a transmitting unit that transmits the data in which the cumulative number is stored by the storing unit to the other device.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. Herein, a data transmitting device, a parallel computer system, and a data transmitting device control method according to the embodiments will be described with reference to the accompanying drawings.
In the following first embodiment, an example of a parallel computer system 1 will be described using
As illustrated in
The computation node 2 is an information processing device that processes information. Specifically, the computation node 2 has a CPU 3, a NIC 4, and a router 10. The CPU 3 is an operation processing device that executes an operation. For example, the CPU 3 executes an operation process of the task that is allocated to the computation node 2. When the CPU 3 transmits data to CPUs 3a to 3e of the other computation nodes 2a to 2e, the CPU 3 transmits an identifier indicating a CPU of the transmission destination and data to be transmitted to the NIC 4.
The NIC 4 packetizes the data received from the CPU 3 and transmits the packet data to the router 10.
As illustrated by (A) of
For example, when the NIC 4 receives data to be transmitted from the CPU 3 to the CPU 3e, the NIC 4 stores the received data in the data portion of the packet and stores the identification information indicating the destination of the packet as the CPU 3e in the header. The NIC 4 stores an initial value “1” of the joining number and stores the flag that designates performing the adjustment using the joining number. Then, the NIC 4 transmits the packet where each information is stored to the router 10.
Returning to
Herein, a specific example of a process that is executed by the router 10 will be described using the drawing.
An arrow that is illustrated by a thick line of
In the description below, the port 0, the port 1, and the port 3 and the port 4 are connected to the NIC 4, the computation node 2a, and other computation nodes adjacent to the computation node 2 by a physical link, respectively. The processes that are executed by the port 1 reception processing unit 12, the port 2 reception processing unit 13, and the port 3 reception processing unit 14 are the same as the process executed by the port 0 reception processing unit 11 and the description will not be repeated. The processes that are executed by the port 1 transmission processing unit 16, the port 2 transmission processing unit 17, and the port 3 transmission processing unit 18 are the same as the process executed by the port 0 transmission processing unit 15 and the description will not be repeated.
When the port 0 reception processing unit 11 receives the packet, the port 0 reception processing unit 11 determines the transmission destination of the received packet. The port 0 reception processing unit 11 transmits a transmission request to the transmission processing unit of the port corresponding to the transmission destination of the received packet to the adjusting circuit 20. The port 0 reception processing unit 11 acquires the joining number from a header portion of the received packet and transmits the acquired joining number to the adjusting circuit 20. When the port 0 reception processing unit 11 receives the transmission permission notification from the adjusting circuit 20, the port 0 reception processing unit 11 transmits the received packet to the data path switch 30.
For example, when the port 0 reception processing unit 11 receives the packet where the computation node 2 is the destination, the port 0 reception processing unit 11 transmits a transmission request to the port 0 to the adjusting circuit 20. When the port 0 reception processing unit 11 receives the packet where the computation nodes 2a to 2e are the destination, the port 0 reception processing unit 11 transmits a transmission request to the port 1 to the adjusting circuit 20.
When the packet where the computation node 2 is the destination is selected by the adjusting circuit 20 to be described below, the port 0 transmission processing unit 15 receives the packet through the data path switch 30. In this case, the port 0 transmission processing unit 15 transmits the received packet to the NIC 4. Similar to the above case, when the port 1 transmission processing unit 16 receives the packet where the computation nodes 2a to 2e are the destination, that is, the packet transmitted to the computation node 2a through the data path switch 30, the port 1 transmission processing unit 16 transmits the received packet to the computation node 2a.
The transmission processing units 15 to 18 of the ports receive the packets through the data path switch 30. The transmission processing units 15 to 18 of the ports receive the joining number from the adjusting circuit 20. The transmission processing units 15 to 18 of the ports store the joining number received from the adjusting circuit 20 as a new joining number in the header of the packet received from the data path switch 30. Then, the transmission processing units 15 to 18 of the ports transmit the packet where the new joining number is stored to the NIC 4 or the computation node connected to the transmission processing units.
The selection of the port that transmits the packet depends on the destination of each packet. For example, in the case of fixed routing, the port that transmits the packet according to destination information of the packet header is uniquely determined. In the case of adaptive routing, the port that transmits the packet is determined according to an adopted algorithm.
In this case, when the packets are received from the plural ports, if the transmitting ports determined from the destinations of the received packets are the same, the competition is generated in the transmission processing units of the transmitting ports. In this case, the packet to be transmitted is selected by the adjusting circuit 20 to be described below. Then, the reception processing unit of the port that receives the selected packet transmits the packet to the transmission processing unit of the transmitting port through the data path switch 3. The transmission processing unit that receives the packet transmits the packet to the NIC 4 or the computation node connected to the transmission processing unit, after updating the joining number of the received packets.
For example, when the port 0 reception processing unit 11 receives the packet transmitted through the port 1 transmission processing unit 16, the port 1 reception processing unit 12 receives the packet transmitted through the port 2 transmission processing unit 17, and the port 2 reception processing unit 13 receives the packet transmitted through the port 1 transmission processing unit 16, the router 10 executes the following process.
That is, the router 10 adjusts the packet received by the port 0 reception processing unit 11 and the packet received by the port 2 reception processing unit 13. In this case, when the adjusting circuit 20 selects the packet received by the port 0 reception processing unit 11, the router 10 transmits the packet received by the port 0 reception processing unit 11 through the port 1 transmission processing unit 16 and the port 2 reception processing unit 13 transmits the packet through the port 1 transmission processing unit 16. That is, when resources (that is, transmission processing units 15 to 18) of the ports that transmit the packets do not compete with respect to the received plural packets, the received plural packets are simultaneously transmitted in parallel.
The adjusting circuit 20 determines whether the competition is generated with respect to each port, on the basis of the transmission request received from each of the reception processing units 11 to 14. When it is determined that the competition is generated in any port, the adjusting circuit 20 executes the following process with respect to the ports where the competition is generated. That is, the adjusting circuit 20 updates the joining number received from each of the reception processing units 11 to 14, on the basis of the number of packets received from each of the reception processing units 11 to 14. The adjusting circuit 20 performs the adjustment with respect to the port transmitting the packet, on the basis of the updated joining number. Then, the adjusting circuit 20 transmits transmission permission to the reception processing unit that receives the packet wining for the adjustment and transmits the updated joining number to the transmission processing unit of the port transmitting the packet.
Meanwhile, the adjusting circuit 20 executes the following process with respect to the port where the competition is not generated. That is, the adjusting circuit 20 transmits the transmission permission to the reception processing unit that receives the packet to be transmitted using the port where the competition is not generated.
Herein, an example of a process of adjusting the packet received by the router 10 and selecting the packet to be transmitted to the router 10a by the adjusting circuit 20 will be described using the drawing.
That is, the adjusting circuit 20 illustrated in
In an example illustrated in
When the joining number updating unit 21 receives the collision number from the collision counter 27, the joining number updating unit 21 adds a value obtained by subtracting 1 from the received collision number to the value stored in the register A 22 and the register B 23. When the joining number updating unit 21 receives information indicating adjustment loss from the adjusting unit 28, the joining number updating unit 21 adds 1 to the value stored in the register B 23. When the joining number updating unit 21 receives transmission permission from the adjusting unit 28, the joining number updating unit 21 transmits the value stored in the register A 22 to the selecting unit 29.
The joining number updating unit 24 is associated with the port 1 reception processing unit 12. When the joining number updating unit 24 receives the joining number from the port 1 reception processing unit 12, the joining number updating unit 24 stores the received joining number in a register A 25 and a register B 26. When the joining number updating unit 24 receives the collision number from the collision counter 27, the joining number updating unit 24 adds a value obtained by subtracting 1 from the received collision number to the value stored in the register B 26. When the joining number updating unit 24 receives information indicating adjustment loss from the adjusting unit 28, the joining number updating unit 24 adds 1 to the value stored in the register B 26. When the joining number updating unit 24 receives the transmission permission from the adjusting unit 28, the joining number updating unit 24 transmits the value stored in the register A 25 to the selecting unit 29.
When the collision counter 27 receives the transmission requests from the port 0 reception processing unit 11 and the port 1 reception processing unit 12, the collision counter 27 counts the number of received transmission requests and transmits the counted number as the collision number to the joining number updating units 21 and 24. When the collision counter 27 receives the transmission requests from the plural reception processing units, that is, the port 0 reception processing unit 11 and the port 1 reception processing unit 12, the collision counter 27 transmits information indicating execution of the adjustment to the adjusting unit 28.
When the adjusting unit 28 receives the information indicating the execution of the adjustment, the adjusting unit 28 acquires the value stored in the register B 23 of the joining number updating unit 21 and the value stored in the register B 26 of the joining number updating unit 24. The adjusting unit 28 compares the acquired values, transmits the transmission permission to the joining number updating unit where the largest value is stored and the reception processing unit associated with the joining number updating unit, and transmits the information indicating the adjustment loss to the other joining number updating unit.
When there are a plurality of largest values among the acquired values, the adjusting unit 28 selects any one of the joining number updating units where the largest values are stored, using the round-robin system. The adjusting unit 28 transmits the transmission permission to the selected joining number updating unit and the reception processing unit associated with the selected joining number updating unit.
For example, when the value stored in the register B 23 is larger than the value stored in the register B 26, the adjusting unit 28 transmits the transmission permission to the port 0 reception processing unit 11 and the joining number updating unit 21, and transmits the information indicating the adjustment loss to the joining number updating unit 24. When the value stored in the register B 26 is larger than the value stored in the register B 23, the adjusting unit 28 transmits the transmission permission to the port 1 reception processing unit 12 and the joining number updating unit 24 and transmits the information indicating the adjustment loss to the joining number updating unit 21.
In this case, when the joining number updating units 21 and 24 receive the information indicating the adjustment loss from the adjusting unit 28, the joining number updating units 21 and 24 add 1 to the values stored in the registers B 23 and 26. The adjusting unit 28 selects the packets to be transmitted to the router 10a, on the basis of the values stored in the registers B 23 and 26. That is, the adjusting unit 28 selects the packet to be transmitted to the router 10a, on the basis of the value obtained by adding the number of times of the adjustment loss of the packet to the value updated according to the number of packets receiving the joining number stored in the received packets. As a result, the adjusting unit 28 can avoid deadlock in an adjusting process.
When the adjusting unit 28 selects the transmitted packet, the adjusting unit 28 transmits the information indicating the port receiving the selected packets to the selecting unit 29 and the data path switch 30 illustrated in
When the selecting unit 29 receives the information indicating the port from the adjusting unit 28, the selecting unit 29 transmits the joining number that is transmitted from the joining number updating unit associated with the port indicated by the received information, to the port 2 transmission processing unit 17. That is, since the adjusting unit 28 illustrated in
For example, when the selecting unit 29 receives the information indicating the port 0 from the adjusting unit 28, the selecting unit 29 transmits the joining number transmitted from the joining number updating unit 21 to the port 2 transmission processing unit 17. When the selecting unit 29 receives the information indicating the port 1 from the adjusting unit 28, the selecting unit 29 transmits the joining number transmitted from the joining number updating unit 24 to the port 2 transmission processing unit 17.
The adjusting process may perform adjustment with respect to one packet. That is, when values are not stored in the register A or the register B of one joining number updating unit, the adjusting unit 28 determines that “0” is stored and performs the adjustment. In this case, even when the competition is not generated, because the adjusting circuit 20 appropriately transmits the packet to the transmission processing unit of the port to transmit the packet, mounting becomes easy. The adjusting circuit 20 outputs only the joining number of packets of winners with respect to the values of the register A 22 and the register A 25 output from the joining number updating units 21 and 24, and outputs a value “0” in the other cases, the selecting unit 29 is not needed.
As such, the adjusting circuit 20 acquires the joining number stored in the packets received by the ports 0 and 1 and updates the acquired joining number according to the number of packets. The adjusting circuit 20 selects the packet where the updated joining number is largest as the packet to be transmitted to the router 10a. For this reason, the adjusting circuit 20 can equally allocate the band to the communication between the computation nodes 2 to 2e.
Returning to
When the packet may not be transmitted such as when the packet may not be received due to exhaustion of resources of the router 10a, the router 10 may not perform the adjustment and may not update the joining number of each packet. As such, the transmission destination may not receive the packet because the transmission destination is a joining point where the transmission destination shares the band with the plural communications. This state can be resolved using a process such as adaptive routing.
For example, the CPUs 3 to 3e, the reception processing units 11 to 14, the transmission processing units 15 to 18, the adjusting circuit 20, the joining number updating units 21 and 24, the collision counter 27, and the adjusting unit 28 form an electronic circuit. In this case, as an example of the electronic circuit, an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA) and a central processing unit (CPU) or a micro processing unit (MPU) are applied.
Each of the register A 22, the register A 25, the register B 23, and the register B 26 is a semiconductor memory element such as a random access memory (RAM) or a flash memory. Each of the selecting unit 29 and the data path switch 30 is a switch that changes the output destination of the packet using information notified from the adjusting unit 28.
As described above, when the router 10 receives the packet, the router 10 acquires the joining number to be the cumulative number of the number of the other packets competing with each packet in the adjusting process that each packet participates in. The router 10 updates the acquired joining number according to the number of received packets, that is, the number of the other packets competing in the adjustment.
In this case, the joining number that is stored in each packet is the cumulative number of the number of the other packets that the packet competes and is a value that indicates an overlapping degree of the communications sharing the band. That is, the joining number that is stored in each packet is directly related to the band allocated to the communication sharing the band. For this reason, when the router 10 performs the adjustment on the basis of the joining number stored in each packet, the router 10 can perform the adjustment on the basis of the information indicating the band. Therefore, the band can appropriately be allocated to the communication between the computation nodes 2 to 2e.
An example of the case where each of the computation nodes 2 to 2d illustrated in
In addition, “1/16” of the entire band is allocated to the communication between the computation node 2a and the computation node 2e and the other band of “1/16” is allocated to the communication between the computation node 2 and the computation node 2e. As such, when each of the routers 10 to 10e performs the adjustment of the packet using the round-robin system, the parallel computer system 1 can allocate only the narrow band to the communication where the hopping number of the packet is large and therefore, it is not said that the parallel computer system 1 can perform appropriate allocation of the band. When the transmission destination ports compete between the plural ports, the band that is allocated to the communication between the computation nodes 2 to 2e is narrowed.
In this case, when each of the routers 10 to 10e makes the packet where the joining number stored in each packet is largest win for the adjustment, the packet in the communication between the computation node 2d and the computation node 2e wins for the adjustment with a ratio of 1/5. For this reason, “1/5” of the entire band is allocated to the communication between the computation node 2d and the computation node 2e. Since the packet in the communication between the computation node 2c and the computation node 2e wins for the adjustment with a ratio of 1/4, “1/4” of the remaining band “1-1/5=4/5” is allocated. As a result, “1/5” of the entire band is allocated to the communication between the computation node 2c and the computation node 2e.
Since the packet in the communication between the computation node 2b and the computation node 2e wins for the adjustment with a ratio of 1/3, “1/5” that is “1/3” of the remaining band “1-2/5” is allocated to the communication between the computation node 2b and the computation node 2e. Since the packet in the communication between the computation node 2a and the computation node 2e wins for the adjustment with a ratio of 1/2, “1/5” that is “1/2” of the remaining band “1-3/5” is allocated to the communication between the computation node 2a and the computation node 2e. To the communication between the computation node 2 and the computation node 2e, the remaining band “1/5” is allocated. That is, when the parallel computer system 1 makes the packet where the joining number is largest win for the adjustment, the parallel computer system 1 can equally allocate the band to the communication between the computation nodes.
Next, the simulation result of a process of allocating the band by the parallel computer system 1 will be described. First, the simulation result of the band that is allocated to the communication between each of the computation nodes 2 to 2d and the computation node 2e when each of the computation nodes 2 to 2d illustrated in
Under these conditions, when each of the routers 10 to 10d performs adjustment of the packet using the round-robin system, the parallel computer system 1 allocates “50.0%” band of the entire band to the communication between the computation node 2d and the computation node 2e. The parallel computer system 1 allocates the band of “25.0%” to the communication between the computation node 2c and the computation node 2e and allocates the band of “12.8%” to the communication between the computation node 2b and the computation node 2e.
The parallel computer system 1 allocates the band of “6.1%” to the communication between the computation node 2a and the computation node 2e and allocates the band of “6.5%” to the communication between the computation node 2 and the computation node 2e. As such, if the routers 10 to 10d of the computation nodes 2 to 2d perform the adjustment of the packet using the round-robin system when one packet joins whenever each packet hops each of the computation nodes 2a to 2d, the band decreases to “1/2” whenever the hoping number of the packet in the communication between each of the computation nodes 2 to 2d and the computation node 2e increases by “1”.
Meanwhile, under the same conditions, when each of the routers 10 to 10d makes the packet where the joining number is largest win for the adjustment, the parallel computer system 1 allocates “20.2%” of the entire band to the communication between the computation node 2d and the computation node 2e. The parallel computer system 1 allocates the band of “19.8%” to the communication between the computation node 2c and the computation node 2e and allocates the band of “19.4%” to the communication between the computation node 2b and the computation node 2e. The parallel computer system 1 allocates the band of “20.8%” to the communication between the computation node 2a and the computation node 2e and allocates the band of “20.1%” to the communication between the computation node 2 and the computation node 2e. As such, if each of the routers 10 to 10e makes the packet where the joining number is largest win for the adjustment, the parallel computer system 1 can equally allocate the band to the communication between each of the computation nodes 2 to 2d and the computation node 2e.
Next, the simulation result in an example of the case where each of the computation nodes 2 to 2d transmits the packet to the computation node 2e and the packets transmitted from the computation nodes other than the computation nodes 2 to 2e to the computation node 2e always join will be described. In this simulation, an example of the case where the 10000 packets that have random sizes requiring 1 to 32 cycles at the time of the transmission are allocated to the computation nodes 2 to 2d and each of the computation nodes 2 to 2d transmits the packet allocated to each computation node to the computation node 2e is simulated.
As illustrated in (D) of
Under these conditions, when each of the routers 10 to 10d performs adjustment of the packet using the round-robin system, the parallel computer system 1 allocates “19.9%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2d and the computation node 2e. The parallel computer system 1 allocates “19.8%”, “20.4%”, and “19.9%” of the band between the computation node 2d and the computation node 2e to the three communications joining in the computation node 2d, that is, the three communications joining from the computation node 2c and the potions other than a NIC 4d to the computation node 2d, respectively.
The parallel computer system 1 allocates “6.6%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2c and the computation node 2e and allocates “6.6%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2b and the communications other than the communication from a NIC 4c joining in the computation node 2c. The parallel computer system 1 allocates “1.8%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2b and the computation node 2e and allocates “1.8%” and “1.6%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2a and the two communications other than the communication from an NIC 4b joining in the computation node 2b.
The parallel computer system 1 allocates “0.6%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2a and the computation node 2e and allocates “0.6%” of the band between the computation node 2d and the computation node 2e to the communication joining in from the computation node 2 and the communications other than the communication from a NIC 4a joining in the computation node 2a. In addition, the parallel computer system 1 allocates “0.7%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2 and the computation node 2e. As such, when each of the routers 10 to 10d performs the adjustment of the packet using the round-robin system, the parallel computer system 1 may not equalize the band to each communication.
Meanwhile, under the same conditions, when each of the routers 10 to 10d makes the packet where the joining number is largest win for the adjustment, the parallel computer system 1 allocates “11.0%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2d and the computation node 2e. The parallel computer system 1 allocates “11.1%”, “11.1%”, and “11.2%” of the band between the computation node 2d and the computation node 2e to the three communications joining in the computation node 2d, that is, the three communications joining from the computation node 2c and the portion other than the NIC 4d to the computation node 2d, respectively.
The parallel computer system 1 allocates “8.0%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2c and the computation node 2e and allocates “8.0%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2b and the communications other than the communication from the NIC 4c joining in the computation node 2c.
The parallel computer system 1 allocates “8.3%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2b and the computation node 2e and allocates “8.5%” and “8.4%” of the band between the computation node 2d and the computation node 2e to the two communications joining in the computation node 2b, respectively. The parallel computer system 1 allocates “5.0%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2a and the computation node 2e and allocates “4.8%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2 and the communications other than the communication from the NIC 4a joining in the computation node 2a.
In addition, the parallel computer system 1 allocates “5.0%” of the entire band to the communication between the computation node 2 and the computation node 2e. As such, when each of the routers 10 to 10e makes the packet where the largest joining number is stored win for the adjustment, the parallel computer system 1 can suppress the deviation of the band allocated to each communication and can appropriately allocate the band to each communication.
Next, the simulation result in an example of the case where the packets transmitted from the computation nodes other than the computation nodes 2 to 2e to the random transmission destination always join and each of the computation nodes 2 to 2e performs the communication will be descried. The size of the packet that is transmitted from each computation node, the total number of packets, and the number of packets that join in the routers 10 to 10d from the computation nodes other than the computation nodes 2 to 2e are the same as those of the simulation described using
Under these condition, when each of the routers 10 to 10e performs adjustment of the packet using the round-robin system, the parallel computer system 1 allocates “20.0%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2d and the computation node 2e. The parallel computer system 1 allocates “20.0%” of the band between the computation node 2d and the computation node 2e to the three communications joining from the computation node 2c and the portions other than the NIC 4d to the computation node 2d.
The parallel computer system 1 allocates “7.5%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2c and the computation node 2e and allocates “7.5%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2b and the communications other than the communication from the NIC 4c joining in the computation node 2c. The parallel computer system 1 allocates “2.0%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2b and the computation node 2e and allocates “2.0” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2a and the two communications other than the communication from the NIC 4b joining in the computation node 2b.
The parallel computer system 1 allocates “0.7%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2a and the computation node 2e and allocates “0.7%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2 and the communications other than the communication from the NIC 4a joining in the computation node 2a. In addition, the parallel computer system 1 allocates “0.8%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2 and the computation node 2e.
Meanwhile, under the same conditions, when each of the routers 10 to 10d makes the packet where the joining number is largest win for the adjustment, the parallel computer system 1 allocates “11.1%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2d and the computation node 2e. The parallel computer system 1 allocates “11.1%” of the band between the computation node 2d and the computation node 2e to the three communications joining from the computation node 2c and the portions other than the NIC 4d to the computation node 2d.
The parallel computer system 1 allocates “10.4%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2c and the computation node 2e and allocates “10.4%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2b and the communications other than the communication from the NIC 4 joining in the computation node 2c. The parallel computer system 1 allocates “11.6%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2b and the computation node 2e and allocates “11.6%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2a and the two communications other than the communication from the NIC 4b joining in the computation node 2b.
The parallel computer system 1 allocates “8.3%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2a and the computation node 2e and allocates “8.3%” of the band between the computation node 2d and the computation node 2e to the communication from the computation node 2 and the communications other than the communication from the NIC 4a joining in the computation node 2a. The parallel computer system 1 allocates “8.4%” of the band between the computation node 2d and the computation node 2e to the communication between the computation node 2 and the computation node 2e.
As such, when each of the routers 10 to 10d makes the packet where the joining number is largest win for the adjustment, the parallel computer system 1 can suppress a ratio of the maximum band and the minimum band allocated to each communication within a range of about “2:1”. For this reason, the parallel computer system 1 can suppress the deviation of the band allocated to each communication and can appropriately allocate the band to each communication.
Next, an example of the case where the computation nodes in which the routers performing the adjustment on the basis of the joining number stored in the packets are disposed are connected by a meshed network will be described using
Each computation node illustrated in
For example, a range that is illustrated by (A) of
For this reason, the router that is illustrated by (B) of
Flow of an Adjusting Process
Next, a flow of a process of adjusting the packet by the router 10 will be described using
Next, the router 10 confirms the competition from the transmission request of each port and updates the values stored in the register A and the register B (step S104). Next, the router 10 executes the adjusting process using the value of the register B (step S105). The router 10 determines whether the packet received in each port wins for the adjustment (step S106). With respect to the packet lost in the adjustment (No in step S106), the router 10 adds 1 to the value stored in the register B of the joining number updating unit associated with the router receiving the packet (step S107). Then, the router 10 executes the adjustment again, using a new resister B (step S105).
With respect to the packet that wins for the adjustment (Yes in step S106), the router 10 transmits the transmission permission to the reception processing unit of the port receiving the packet and transmits the value stored in the register A to the transmission processing unit of the transmitting port (step S108). Next, the router 10 moves the packet from the receiving port to the transmitting port, through the data path switch (step S109). The transmission processing unit of the transmitting port stores the value of the register A as the joining number in the header portion of the packet (step S110). Then, the router 10 transmits the packet to the computation node to be the output destination (step 5111) and ends the process.
As described above, when the router 10 receives the packet, the router 10 acquires the joining number to be the cumulative number of the number of the other packets competing with each packet in the adjusting process that each packet participates in. The router 10 updates the acquired joining number on the basis of the number of received packets, that is, the number of other packets competing in the adjustment. The router 10 selects the packet to be transmitted to the router 10a, on the basis of the updated joining number of each packet. Then, the router 10 stores the updated joining number in the header of the selected packet and transmits the selected packet to the router 10a.
For this reason, the parallel computer system 1 can appropriately distribute the band to the communication between the computation nodes 2 to 2e. That is, when the router 10 performs the adjustment on the basis of the joining number stored in each packet, the parallel computer system 1 can perform the adjustment on the basis of the information indicating the band. Therefore, the parallel computer system 1 can appropriately allocate the band to the communication between the computation nodes 2 to 2e. That is, the parallel computer system 1 performs the adjustment on the basis of the number of times of collision with the other packets up to the time of receipt of one packet at the destination. Therefore, the communication between the computation nodes 2 to 2e can be performed with high efficiency.
In this case, since the cumulative number of the other packets that each packet competes is several tens at most, the number of bits needed to store the joining number becomes smaller than the number of bits needed to store the information indicating the time. For example, a router (SeaStar) that is used in a parallel computer of Cray stores information of 10 bits indicating latency in the packet. However, if it is assumed that the joining number of the packets is about 32, the number of bits needed to store the joining number is 5. For this reason, when the parallel computer system 1 controls the band on the basis of the information indicating the joining number, the size of the header portion in the packet decreases. As a result, the large amount of data can be stored in one packet. Therefore, the parallel computer system 1 can appropriately perform the communication between the computation nodes 2 to 2e.
The joining number of the packet depends on the scale of the parallel computer system 1. That is, when the joining number is 32, the packet hops by 16 in each axial direction until the transmitted packet is received in the two-dimensional meshed network and the total number is added by 1 for each hop. For this reason, the two-dimensional meshed network can correspond to a parallel computer system that has 17×17=289 computation nodes. Likewise, a two-dimensional toroidal network can correspond to a parallel computer system that has 32×32=1024 computation nodes. This example applies to the case where the total number is added by 1 for each hop. Apparently, this transmission system can also be applied to a parallel computer system having the low communication frequency, even if the parallel computer system has more computation nodes.
The joining number may not depend on the scale of the parallel computer system 1, the topology for connecting the computation nodes 2 to 2e, the communication pattern between the computation nodes 2 to 2e, and the routing algorithm and can easily be measured. For this reason, when each of the routers 10 to 10e performs the adjustment on the basis of the joining number, the parallel computer system 1 can appropriately allocate the band to the communication between the computation nodes 2 to 2e, without depending on the configuration of the computation nodes 2 to 2e. The parallel computer system 1 can appropriately allocate the band to each communication, without executing a process of further allocating the band to the communication where the band is sufficiently allocated.
The router 10 sets a value obtained by subtracting 1 from the number of received packets (corresponding to the packets) to the joining number acquired from each packet as the new joining number and updates the joining number of the packet transmitted to the router 10a with the new joining number. That is, the router 10 sets a value obtained by adding the number of received packets (other than the packets transmitted to the router) competing with respect to the transmitting port with the packets to the joining number acquired from each packet as the new joining number and updates the joining number of the packet transmitted to the router 10a with the new joining number. For this reason, the router 10 appropriately adds the number of packets competing in the adjustment in the router 10 to the joining number of each packet transmitted to the router 10a. As a result, the parallel computer system 1 can appropriately allocate the band to the communication between the computation nodes 2 to 2e.
Since the router 10 can easily count the number of packets competing in the adjustment performed by the router, the router 10 can store the information indicating the band in each packet, even though a complicated process is not executed. As a result, the router 10 can easily be mounted.
The router 10 compares the joining number acquired from the packets and transmits the packet where the joining number is largest to the router 10a. For this reason, the parallel computer system 1 allocates the wider band to the communication where the allocated band is minimal among the communications between the computation nodes 2 to 2e. Therefore, the parallel computer system 1 can equally allocate the band to the communication between the computation nodes 2 to 2e.
The router 10 executes new adjustment using a value obtained by adding 1 to the updated joining number, with respect to the packet that is not transmitted in the previous adjustment. That is, the router 10 performs new adjustment with the high priority corresponding to the number of times of adjustment loss, with respect to the packet lost in the adjustment. Finally, the router 10 transmits all of the packets to the router 10a. As a result, the parallel computer system 1 can prevent the deadlock.
The parallel computer system 1 according to an aspect of the invention is described above. However, the invention may be embodied in various forms in addition to the parallel computer system 1 described above. Therefore, another embodiment that is included in the invention will be described as the second embodiment.
(1) With Respect to Each of the Routers 10 to 10e
Each of the routers 10 to 10e makes the packet where the largest value is stored among the joining numbers stored in the packets participating in the adjustment win for the adjustment. However, the embodiments are not limited thereto and an arbitrary process may be executed, as long as the band can appropriately be allocated to the communication between the computation nodes 2 to 2e, on the basis of the joining number stored in each packet.
For example, each of the routers 10 to 10e may calculate the priority weighted to the joining number stored in each packet on the basis of the transmission destination of each packet and perform the adjustment on the basis of the calculated priority. When this process is executed, the parallel computer system 1 can equally allocate the band to the communication between the computation nodes 2 to 2e and appropriately allocate the band set between the computation nodes.
Each of the routers 10 to 10e may have a display device that externally displays the number of packets participating in the adjustment. In this case, a user of the parallel computer system 1 can easily specify a joining place where congestion of the packets starts when the congestion of the packets is generated. That is, once the congestion is generated, even though a use amount of a buffer of each of the routers 10 to 10e or a use amount of credits is monitored, the buffer resources are exhausted in the entire path transmitting and receiving the packets. As a result, it becomes difficult to discover a starting point of the congestion. Meanwhile, the number of packets that the routers 10 to 10e compete increases in only a place where the joining is generated strongly. For this reason, when the parallel computer system 1 externally displays the number of packets competing in the routers 10 to 10e, the parallel computer system 1 makes the user easily specify the generation position of the congestion.
Each of the routers 10 to 10e may externally display the joining number of the received packets for each port. The routers 10 to 10e may count the cumulative number of the number of virtual channels (VC) competing in the adjustment between the VCs and display the cumulative number externally. When the parallel computer system 1 has the routers 10 to 10e, the parallel computer system 1 makes the user easily specify the place where the competition between the VCs is frequently generated.
When a flag to designate that the adjustment is not performed using the joining number is stored in the header of the packet, each of the routers 10 to 10e may use an arbitrary adjusting method including the round-robin system.
(2) With Respect to an Initial Value of the Joining Number
When the NICs 4 to 4e according to the first embodiment generate the packets, the NICs 4 to 4e store “1” as the initial value of the joining number. However, the embodiments are not limited thereto. For example, when the NICs 4 to 4e generate important packets for a system management, the NICs 4 to 4e store a value of “2” or more as the initial value of the joining number and can preferentially transmit the packet.
For example, when the NIC 4 generates the packet where “2” is stored as the initial value of the joining number, the parallel computer system 1 can allocate the double band of the normal band to the communication using the packet. Likewise, when the NIC 4 stores an arbitrary number “n” as the initial value of the joining number in the packet, the parallel computer system 1 can allocate the band of “n” times of the normal band to the communication using the packets.
(3) With Respect to the Packet
The packet described above has the identification information, the joining number, and the flag in the header portion. However, the embodiments are not limited thereto. A packet using an arbitrary protocol may be used, as long as the joining number is stored in the header portion of the packet.
According to an aspect, a band can appropriately be distributed to communication between computation nodes without deteriorating data transmission efficiency.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2011-063400 | Mar 2011 | JP | national |