Claims
- 1. A method of performing arithmetic functions, using a shift and operate procedure, in a computer system having a distributed parallel torus architecture with a multitude of interconnected nodes, the method comprising the steps:
providing each of a group of the nodes with the same set of data values; performing a global arithmetic operation, wherein each of the nodes performs the arithmetic operation on all of the data values to obtain a final value; and ensuring that all of the nodes of the group perform said global operation on the data values in the same order to ensure binary reproducible results.
- 2. A method according to claim 1, wherein the ensuring step includes the step of, each node performing the global arithmetic operation after the node is provided with all of the data values.
- 3. A method according to claim 2, wherein the providing step includes the step of each node of the group receiving the data values from other nodes of the group.
- 4. A method according to claim 1, wherein the nodes are connected together by bidirectional links, and the providing step includes the step of sending the data values to the nodes in two directions over said links.
- 5. A method according to claim 1, wherein the providing step includes the step of each one of the nodes injecting one of the data values into the network only once.
- 6. A method according to claim 5, wherein the injecting step includes the step of, nodes of the group of nodes, other than said each one of the nodes, forwarding said one of the data values to other nodes of the group to reduce the latency of the global operation.
- 7. A system for performing arithmetic functions, using a shift and operate procedure, in a computer system having a distributed parallel torus architecture with a multitude of interconnected nodes, the system comprising:
a group of the nodes provided with the same set of data values; means for performing a global arithmetic operation, wherein each of the nodes performs the arithmetic operation on all of the data values to obtain a final value; and means for ensuring that all of the nodes of the group perform said global operation on the data values in the same order to ensure binary reproducible results.
- 8. A system according to claim 7, wherein the ensuring means includes means for performing the global arithmetic operation at each node after the node is provided with all of the data values.
- 9. A system according to claim 7, wherein each node of the group receives the data values from other nodes of the group.
- 10. A system according to claim 7, wherein the nodes are connected together by bidirectional links, and the providing means includes means for sending the data values to the nodes in two directions over said links.
- 11. A system according to claim 7, wherein each one of the nodes injects one of the data values into the network only once.
- 12. A system according to claim 7, wherein nodes of the group of nodes, other than said each one of the nodes, forward said one of the data values to other nodes of the group to reduce the latency of the global operation.
- 13. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing arithmetic functions, using a shift and operate procedure, in a computer system having a distributed parallel torus architecture with a multitude of interconnected nodes, the method steps comprising:
providing each of a group of the nodes with the same set of data values; performing a global arithmetic operation, wherein each of the nodes performs the arithmetic operation on all of the data values to obtain a final value; and ensuring that all of the nodes of the group perform said global operation on the data values in the same order to ensure binary reproducible results.
- 14. A program storage device according to claim 13, wherein the ensuring step includes the step of, each node performing the global arithmetic operation after the node is provided with all of the data values.
- 15. A program storage device according to claim 14, wherein the providing step includes the step of each node of the group receiving the data values from other nodes of the group.
- 16. A program storage device according to claim 13, wherein the nodes are connected together by bi-directional links, and the providing step includes the step of sending the data values to the nodes in two directions over said links.
- 17. A program storage device according to claim 13, wherein the providing step includes the step of each one of the nodes injecting one of the data values into the network only once.
- 18. A program storage device according to claim 17, wherein the injecting step includes the step of, nodes of the group of nodes, other than said each one of the nodes, forwarding said one of the data values to other nodes of the group to reduce the latency of the global operation.
- 19. A method of performing an arithmetic function in a computer system having a multitude of nodes interconnected by a global tree network that supports integer combining operations, the method comprising the steps of:
providing each of a group of nodes with first values; processing each of the first values, according to a first defined process, to obtain a respective second value from each of the first values, wherein all of the second values are integer values; and performing a global integer combine operation, using said second values, over the network.
- 20. A method according to claim 19, wherein the performing step includes the step of performing a global unsigned integer sum over the network.
- 21. A method according to claim 19, wherein the performing step includes the step of performing a global maximum operation over the network, and using results of said global maximum operation to identify the maximum of the first values.
- 22. A method according to claim 19, wherein the performing step includes the step of performing a global maximum operation over the network, and using the results of said global maximum operation to identify the minimum of the first values.
- 23. A system for performing an arithmetic function in a computer system having a multitude of nodes interconnected by a global tree network that supports integer combining operations, the method comprising the steps of:
a group of nodes, each of the nodes of the group being provided with first values; a processor to process each of the first values, according to a first defined process, to obtain a respective second value from each of the first values, wherein all of the second values are integer values; and means for performing a global integer combine operation, using said second values, over the network.
- 24. A system according to claim 23, wherein the means for performing includes means for performing a global unsigned integer sum over the network.
- 25. A system according to claim 23, wherein the means for performing step includes means for performing a global maximum operation over the network, and for using results of said global maximum operation to identify the maximum of the first values.
- 26. A system according to claim 23, wherein the means for performing step includes means for performing a global maximum operation over the network, and for using the results of said global maximum operation to identify the minimum of the first values.
- 27. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing an arithmetic function in a computer system having a multitude of nodes interconnected by a global tree network that supports integer combining operations, the method steps comprising:
providing each of a group of nodes with first values; processing each of the first values, according to a first defined process, to obtain a respective second value from each of the first values, wherein all of the second values are integer values; and performing a global integer combine operating, using said second values, over the network.
- 28. A program storage device according to claim 27, wherein the performing step includes the step of performing a global unsigned integer sum over the network.
- 29. A program storage device according to claim 27, wherein the performing step includes the step of performing a global maximum operation over the network, and using results of said global maximum operation to identify the maximum of the first values.
- 30. A program storage device according to claim 27, wherein the performing step includes the step of performing a global maximum operation over the network, and using the results of said global maximum operation to identify the minimum of the first values.
- 31. A method of performing a global operation on a computer system having a multitude of nodes interconnected by a global tree network that supports integer combining operations, the method comprising:
providing each node with one or more numbers of any type; assembling the numbers of the nodes into an array, the array having a given number of positions, said assembling step including the steps of
i) each node putting one or more of the numbers of the node into one or more of the positions of the array, and putting zero values into all of the other positions of the array, and ii) using the global tree network to sum all the numbers put into each position in the array.
- 32. A method according to claim 31, wherein:
the given number of position of the array are arranged in a defined sequence; and the assembling step includes the further step of, each node establishing an associated array also having said given number of positions arranged in the defined sequence, and putting the one or more numbers of the node into one or more of the positions of the associated array, and putting zero values in all of the other positions of the associated array.
- 33. A system for performing a global operation on a computer system having a multitude of nodes interconnected by a global tree network that supports integer combining operations, the system comprising:
a group of nodes, each of the group of nodes having one or more numbers; means for assembling the numbers of the nodes into an array, the array having a given number of positions, said assembling means including
i) means for putting one or more of the numbers of the nodes of the group into one or more of the positions of the array, and for putting zero values into all of the other positions of the array, and ii) means for using the global tree network to sum all the numbers put into each position in the array.
- 34. A system according to claim 33, wherein:
the given number of position of the array are arranged in a defined sequence; and the means for assembling further includes means for establishing a respective one array associated with each of the nodes of the group and also having said given number of positions arranged in the defined sequence, for putting the one or more numbers of each of the nodes into one or more of the positions of the associated array, and for putting zero values in all of the other positions of the associated array.
- 35. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing a global operation on a computer system having a multitude of nodes interconnected by a global tree network that supports integer combining operations, the method steps comprising:
providing each node with one or more numbers; assembling the numbers of the nodes into an array, the array having a given number of positions, said assembling step including the steps of
a. each node putting one or more of the numbers of the node into one or more of the positions of the array, and putting zero values into all of the other positions of the array, and b. using the global tree network to sum all the numbers put into each position in the array.
- 36. A method according to claim 35, wherein:
the given number of position of the array are arranged in a defined sequence; and the assembling step includes the further step of, each node establishing an associated array also having said given number of positions arranged in the defined sequence, and putting the one or more numbers of the node into one or more of the positions of the associated array, and putting zero values in all of the other positions of the associated array.
- 37. A method of performing an arithmetic function in a computer systems having a multitude interconnected by a global tree network that supports integers combining operations, the method comprising the step of:
each of the nodes contributing a set of first values; and performing a global integer combine operation, using said first values, over the network.
- 38. A method according to claim 37, wherein the performing step includes the step of using results of he global integers combine operations to identify a characteristic of the first values
- 39. A system for performing an arithmetic function is a computer system having a multitude of nodes interconnected by a global tree network that supports integers combining operations, the system comprising:
a group of nodes, each of the nodes of the group consisting of set of first values; and a processor to perform a global integer combine operations, using the first values, over the network.
- 40. A system according to claim 39, wherein the processor includes means for using results of the global integers combine operation to identify a characteristic of the first values.
- 41. A method of operating a parallel processing computer system having a multitude of nodes interconnected by both a global tree network and a torus network, the method comprising:
using the computer systems to perform defined operations; and using both the torus and tree networks to cooperate on reduction operations.
- 42. A method according to claim 41, wherein the step of using both the torus and tree networks includes the step of doing so by having one processor handle torus operations and another processor handle the tree operations.
- 43. A method according to claim 41, wherein the step of using both the torus and tree networks includes the step of doing so by arranging the torus communications so that, in a three-dimensional torus, no node on the torus receives more than two packets to combine.
- 44. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for operating a parallel processing computer system having a multitude of nodes interconnected by both a global tree network and a torus network, the method step comprising:
using the computer systems to perform defined operations; and using both the torus and three networks to cooperate on reduction operations.
- 45. A program storage device according to claim 44, wherein the step of using both the torus and tree networks includes the step of doing so by having one processor handle torus operations and another processor handle the tree operations.
- 46. A program storage device according to claim 44, wherein the step of using both the torus and tree networks includes the step of doing so by arranging the torus communications so that, in a three-dimensional torus, no node on the torus receives more than two packets to combine.
- 47. A parallel processing computer system comprising:
a multitude of nodes; a global tree network also interconnecting the nodes; a torus network also interconnecting the nodes; and means for using both the torus and tree networks to cooperate on reduction operations.
- 48. A computer system according to claim 47, wherein the means for using both the torus and tree networks include one processor to handle torus operations and another processor to handle the tree operations.
- 49. A computer system according to claim 47, wherein the means for using both the torus and tree networks includes means for arranging the torus communications so that, is a three-dimensional torus, no node on the torus receives more than two packets to combine.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present invention claims the benefit of commonly-owned, co-pending U.S. Provisional Patent Application Serial No. 60/271,124 filed Feb. 24, 2001 entitled MASSIVELY PARALLEL SUPERCOMPUTER, the whole contents and disclosure of which is expressly incorporated by reference herein as if fully set forth herein. This patent application is additionally related to the following commonly-owned, co-pending United States patent applications filed on even date herewith, the entire contents and disclosure of each of which is expressly incorporated by reference herein as if fully set forth herein. U.S. patent application Ser. Nos. (YOR920020027US1, YOR920020044US1 (15270)), for “Class Networking Routing”; U.S. patent application Ser. No. (YOR920020028US1 (15271)), for “A Global Tree Network for Computing Structures”; U.S. patent application Ser. No. (YOR920020029US1 (15272)), for ‘Global Interrupt and Barrier Networks”; U.S. patent application Ser. No. (YOR920020030US1 (15273)), for ‘Optimized Scalable Network Switch”; U.S. patent application Ser. Nos. (YOR920020031US1, YOR920020032US1 (15258)), for “Arithmetic Functions in Torus and Tree Networks’; U.S. patent application Ser. Nos. (YOR920020033US1, YOR920020034US1 (15259)), for ‘Data Capture Technique for High Speed Signaling”; U.S. patent application Ser. No. (YOR920020035US1 (15260)), for ‘Managing Coherence Via Put/Get Windows’; U.S. patent application Ser. Nos. (YOR920020036US1, YOR920020037US1 (15261)), for “Low Latency Memory Access And Synchronization”; U.S. patent application Ser. No. (YOR920020038US1 (15276), for ‘Twin-Tailed Fail-Over for Fileservers Maintaining Full Performance in the Presence of Failure”; U.S. patent application Ser. No. (YOR920020039US1 (15277)), for “Fault Isolation Through No-Overhead Link Level Checksums’; U.S. patent application Ser. No. (YOR920020040US1 (15278)), for “Ethernet Addressing Via Physical Location for Massively Parallel Systems”; U.S. patent application Ser. No. (YOR920020041US1 (15274)), for “Fault Tolerance in a Supercomputer Through Dynamic Repartitioning”; U.S. patent application Ser. No. (YOR920020042US1 (15279)), for “Checkpointing Filesystem”; U.S. patent application Ser. No. (YOR920020043US1 (15262)), for “Efficient Implementation of Multidimensional Fast Fourier Transform on a Distributed-Memory Parallel Multi-Node Computer”; U.S. patent application Ser. No. (YOR9-20010211US2 (15275)), for “A Novel Massively Parallel Supercomputer”; and U.S. patent application Ser. No. (YOR920020045US1 (15263)), for “Smart Fan Modules and System”.
PCT Information
Filing Document |
Filing Date |
Country |
Kind |
PCT/US02/05618 |
2/25/2002 |
WO |
|