This disclosure relates in general to the field of computing, and more particularly, to a collective communication operation.
Interconnected networks are a critical component of some modern computer systems. As processor and memory performance, as well as the number of processors in a multicomputer system, continues to increase, multicomputer interconnected networks are becoming even more critical. One characteristic of an interconnected network is parallel computing. One aspect of parallel computing is the ability to perform collective communication operations. Generally, a collective communication operation can be thought of as a communication operation that involves a group of processes.
To provide a more complete understanding of the present disclosure and features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying figures, wherein like reference numerals represent like parts, in which:
The FIGURES of the drawings are not necessarily drawn to scale, as their dimensions can be varied considerably without departing from the scope of the present disclosure.
The following detailed description sets forth example embodiments of apparatuses, methods, and systems relating to a communication system for enabling a collective communication operation. Features such as structure(s), function(s), and/or characteristic(s), for example, are described with reference to one embodiment as a matter of convenience; various embodiments may be implemented with any suitable one or more of the described features.
Elements of
Communication system 100 may include a configuration capable of transmission control protocol/Internet protocol (TCP/IP) communications for the transmission or reception of packets in a network. Communication system 100 may also operate in conjunction with a user datagram protocol/IP (UDP/IP) or any other suitable protocol where appropriate and based on particular needs.
For purposes of illustrating certain example techniques of communication system 100, it is important to understand the communications that may be traversing the network environment. The following foundational information may be viewed as a basis from which the present disclosure may be properly explained.
Users have more communications choices than ever before. A number of prominent technological trends are currently afoot (e.g., more computing devices, more connected devices, etc.). One current trend is interconnected networks. Interconnected networks are a critical component of some modern computer systems. From large scale systems to multicore architectures, the interconnected network that connects processors and memory modules significantly impacts the overall performance and cost of the system. As processor and memory performance continue to increase, multicomputer interconnected networks are becoming even more critical as they largely determine the bandwidth and latency of remote memory access.
One type of interconnected network allows for parallel computing. Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. One aspect of parallel computing is a collective communication operation. For example, a gather or scatter operation is a collective communication operation that can be performed on a parallel system.
An allgather operation is a type of gather operation that includes an event where data on each node is combined and spread over the network so each node on the network has the same (or nearly the same) data. In an allgather operation, each node collects the data from each of the other nodes. For example, node 0 contributes data d0, node 1 contributes data d1, node 2 contributes data d2, etc., and as a result, each node (e.g., node, node 0, node 1, node 2, etc.) has the data from the other nodes (e.g., the data array of d0, d1, d2, etc.).
In a scatter operation, the opposite of the allgather operation occurs. More specifically, in a gather operation, data is gathered on a single process while in a scatter operation a single process has the data that is to be distributed to other processes. For example, process P0 has data d0, d1, d2, d3. The scatter operation results in process P0 with data d0, process P1 with data d1, process P2 with data d2 and so on. Generally, a collective communication operation can be thought of as a communication that involves a group of processes. What is needed is an efficient system and method for a collective communication operation, especially on a multi-level direct network topology.
Known solutions like recursive doubling and ring based approaches are well known algorithms for an allgather operation but these methods are oblivious to network topology. Also, the recursive doubling approach can cause network contention because of the nature of its communication pattern. A ring based approach for an allgather operation can avoid contention but has significantly large number of phases (as many as there are nodes participating in the collective) that lead to a large runtime.
A communication system that allows for a collective communication operation, as outlined in
Also, in an allgather or scatter process, the node can receive data from the second node, where the data is related to an allgather processor or a scatter process. If the data is related to an allgather process, the received data can be combined with the consolidated data before communicating the combined consolidated data to another node, in another collection of nodes, in another group of nodes. In some examples, the consolidated data is communicated using a pipeline, where the pipeline includes more than one communication path and the consolidated data is divided into portions, where the number of portions is equal to the number of communication paths. The node may be part of an interconnected network and may be part of a multi-tiered dragonfly topology network or some other parallel computing architecture.
One messaging system designed for parallel computing architectures is message passing interface (MPI). MPI defines an application programming interface (API) for message passing in parallel programs. MPI defines both point-to-point communication routines, such as sends and receives between pairs of processes and collective communication routines that involve a group of processes that need to perform some operation together. For example, MPI can define broadcasting data from a root process to other processes and finding a global minimum or maximum of data values on all processes (called a reduction operation). Collective communication routines provide the user with a simple interface for commonly required operations and can also enable a system to optimize these operations for a particular architecture. As a result, collective communication is widely and frequently used in many applications and the performance of collective communication routines is often critical to the performance of the overall system or application. In an example, communication system 100 can be configured to perform a collective communication operation, such as a gather or scatter collective communication operation, on a multi-level direct network topology (e.g., dragonfly topology). In an example implementation, the collective communication operation can be implemented for a multi-tier dragonfly topology.
A multi-tier dragonfly topology has a much more complicated structure than the simple two-tier topologies. This difference can allow for several enhancements such as scattering or distributing data across nodes in a switch so that as many as possible inter-switch links can be utilized simultaneously. Also, in a multi-tier dragonfly topology, there are multiple links (e.g., communication paths) per inter-group connection. These links can be utilized in a collective communication operation (e.g., an allgather operation) by causing multiple nodes to send data simultaneously through the intergroup connection. Further, in many other network topologies, nodes are capable of sending data simultaneously on all links originating from the node. This is not true in the multi-tier dragonfly topology. In the multi-tier dragonfly topology, typically there is only one outgoing connection from the node to the switch. The switch provides the node with direct connection to other nodes in the switch, however, the data cannot be sent simultaneously to other nodes in the collection of nodes. In some two-tier dragonfly topologies, the presence of many more links and multiple levels of links as compared to the two-tier topology may require many more steps of intelligent data flow rather than a simple three-step procedure as with other network topologies.
In an implementation, communication system 100 can leverage the hierarchical structure of the dragonfly network, reduce or completely avoid network contention by organizing the communication pattern, leverage available bandwidth by scattering the data across compute nodes, have enough nodes in each switch/group to utilize the all-to-all connections, and/or pipeline the final broadcast by segmenting large messages in to small chunks. In a specific example, communication system 100 can be configured to avoid contention and have a performance level that is within one percent (1%) of a naïve lower bound.
In a specific example, communication system 100 can be configured to allow for a gather or scatter collective operation on a multi-tier dragonfly topology or some other interconnected network topology. In a gather or scatter collective operation, most or every process contributes a data item. The contributed data item is collected or consolidated (e.g., into memory, a single buffer, etc.) and the consolidated data can be made available to all the nodes. As explained above, in a dragonfly topology, there are direct connections at each tier of the topology. At the first tier of a three tier dragonfly topology, multiple nodes (e.g., nodes 110a,a-110a,o illustrated in
Communication system 100 can be configured to perform a data exchange in a hierarchical topology aware manner. In an example, as illustrated in
In some examples, such as when all the nodes in the network are not being used for a specific operation, it may be necessary for a node to send data to another node that is not directly in communication with the node. In these examples, the path may take one or more hops or go through one or more switches. For example, collection of nodes 104e in group 102a is directly in communication with a node in group 102e. If none of the nodes in collection of nodes 104e is involved in the process, then a node in collection of nodes 104a can communicate data to a node in group 102d that is involved in the process and that node in group 102d can communicate the data to a node in group 102e.
Communication system 100 can be configured such that the entire system data can be distributed across nodes in each group that are participating the collective process. The data can be gathered on every node by the nodes exchanging data across switches such that each node coupled to each switch has the data of the entire group. Then, nodes in a collection of nodes exchange data with each other resulting in each node having the data of the entire system. The data can be broadcasted to other processes on the node. In an example, the broadcast can be pipelined by dividing messages into chunks and broadcasting each chuck as it arrives instead of waiting for the entire data transfer to be complete.
The process leverages the all-to-all direct connections at each level of the dragonfly topology and helps to ensures that the source and destination of the messages are within the same set of nodes (where a set of nodes is either a collection of nodes or a group of nodes) corresponding to a specific level of the topology. Therefore, there is no, or limited, network congestion or interference because the messages are contained within the specific level of the topology. Communication system 100 can help ensure that there are enough nodes (and that they have the required data) so that the all-to-all connections are maximally utilized.
In a specific implementation of a three-tier dragonfly topology, the three-tier dragonfly topology may have sixty four (64) processes/node, sixteen (16) nodes/switch, thirty-two (32) switches/group and one hundred and twenty eight (128) groups. Each process can contribute eight (8) bytes of data. The final data size may be 32 MB (8*64*16*32*128=32 MB). The system can be configured to gather data from all the processes on a node on to a single process per node (called the leader process of that node). Nodes within a collection of nodes are directly connected to each other and the nodes exchange each other's data in an all-to-all exchange pattern. This allows each node to have the data of every other node in communication with the same switch. The nodes can send the data to nodes in other collections of nodes. In an example, there can be thirty one (31) links that connect a switch to 31 other switches or 31 links that connect a collection of nodes to 31 other collection of nodes.
In addition, there may be 16 nodes included in each collection of nodes. Therefore, there is enough bandwidth available for every node to continuously send data. This allows every node to have the data of an entire group. The nodes can send data across groups such that each group has the data of the entire system and the data is distributed across nodes in the group. Nodes in a collection of nodes can send the data to those groups that are directly connected to the switch in the collection of nodes. The data being sent can be distributed across nodes in the destination switch (e.g., the switch coupled to the nodes that will receive the data). The nodes across switches in the same group of nodes can exchange data so that nodes in each collection of nodes have the data of the entire system. Then, nodes within each collection of nodes can exchange the data such that every node has the data of the entire system. There is no contention because the process ensures that the communication occurs within a switch and the data is broadcasted to other processes on the node. Since the message size is very large (e.g., about 30 MB), the message can be broadcast as a pipeline where the message is divided into chunks and the broadcast is done concurrently as these chunks arrive. Note that the number of nodes, collection of nodes, groups, etc. does not need to be symmetric or similar and one node can communicate with different nodes in different collection of nodes or groups. It should be appreciated that the teachings and examples provided herein are readily scalable and applicable to a variety of collective network architectures.
Turning to the infrastructure of
In communication system 100, network traffic, which is inclusive of packets, frames, signals (analog, digital or any combination of the two), data, etc., can be sent and received according to any suitable communication messaging protocols. Suitable communication messaging protocols can include MPI, a multi-layered scheme such as Open Systems Interconnected (OSI) model, or any derivations or variants thereof (e.g., Transmission Control Protocol/Internet Protocol (TCP/IP), user datagram protocol/IP (UDP/IP)). Additionally, radio signal communications (e.g., over a cellular network) may also be provided in communication system 100. Suitable interfaces and infrastructure may be provided to enable communication with the cellular network.
The term “packet” as used herein, refers to a unit of data that can be routed between a source node and a destination node on a packet switched network. A packet includes a source network address and a destination network address. These network addresses can be Internet Protocol (IP) addresses in a TCP/IP messaging protocol. The term “data” as used herein, refers to any type of binary, numeric, voice, video, textual, or script data, or any type of source or object code, or any other suitable information in any appropriate format that may be communicated from one point to another in electronic devices and/or networks. Additionally, messages, requests, responses, and queries are forms of network traffic, and therefore, may comprise packets, frames, signals, data, etc.
Turning to
Each collection of nodes 104a-104e can be in communication with another collection of nodes using a collection of nodes path. For example, collection of nodes 104a can be in communication with collection of nodes 104b using collection of nodes path 106a, with collection of nodes 104c using collection of nodes path 106f, with collection of nodes 104d using collection of nodes path 106g, and with collection of nodes 104e using collection of nodes path 106e. Collection of nodes 104b can be in communication with collection of nodes 104c using collection of nodes path 106b, with collection of nodes 104d using collection of nodes path 106j, and with collection of nodes 104e using collection of nodes path 106h. Collection of nodes 104c can be in communication with collection of nodes 104d using collection of nodes path 106c and with collection of nodes 104e using collection of nodes path 106i. Collection of nodes 104d can be in communication with collection of nodes 104e using collection of nodes path 106d.
Turning to
Nodes (e.g., nodes 110a,a-110a,o) and switches (e.g., switches 112a-112e) can include memory elements (e.g., memory 116) for storing information to be used in the operations outlined herein. Each node (e.g., nodes 110a,a-110a,o) and switch (e.g., switches 112a-112e) may keep information in any suitable memory element (e.g., random access memory (RAM), read-only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), application specific integrated circuit (ASIC), non-volatile memory (NVRAM), magnetic storage, magneto-optical storage, flash storage (SSD), etc.), software, hardware, firmware, or in any other suitable component, device, element, or object where appropriate and based on particular needs. Any of the memory items discussed herein should be construed as being encompassed within the broad term ‘memory element.’ Moreover, the information being used, tracked, sent, or received in communication system 100 could be provided in any database, register, queue, table, cache, control list, or other storage structure, all of which can be referenced at any suitable timeframe. Any such storage options may also be included within the broad term ‘memory element’ as used herein.
Additionally, each node (e.g., nodes 110a,a-110a,o) and switch (e.g., switches 112a-112e) may include a processor (e.g., processor 118) that can execute software or an algorithm to perform activities as discussed herein. A processor can execute any type of instructions associated with the data to achieve the operations detailed herein. In one example, each processor can transform an element or an article (e.g., data) from one state or thing to another state or thing. In another example, the activities outlined herein may be implemented with fixed logic or programmable logic (e.g., software/computer instructions executed by a processor) and the elements identified herein could be some type of a programmable processor, programmable digital logic (e.g., a field programmable gate array (FPGA), an EPROM, an EEPROM) or an ASIC that includes digital logic, software, code, electronic instructions, or any suitable combination thereof. Any of the potential processing elements, modules, and machines described herein should be construed as being encompassed within the broad term ‘processor.’
In an example implementation, the nodes (e.g., nodes 104a-104d) in each group of nodes 104a-104e are network elements, meant to encompass network appliances, servers (both virtual and physical), processors, modules, or any other suitable virtual or physical device, component, element, or object operable to process and exchange information in a collective communication network environment. Network elements may include any suitable hardware, software, components, modules, or objects that facilitate the operations thereof, as well as suitable interfaces for receiving, transmitting, and/or otherwise communicating data or information in a network environment. This may be inclusive of appropriate algorithms and communication protocols that allow for the effective exchange of data or information.
In certain example implementations, the functions outlined herein may be implemented by logic encoded in one or more tangible media (e.g., embedded logic provided in an ASIC, digital signal processor (DSP) instructions, software (potentially inclusive of object code and source code) to be executed by a processor, or other similar machine, etc.), which may be inclusive of non-transitory computer-readable media. In some of these instances, memory elements can store data used for the operations described herein. This includes the memory elements being able to store software, logic, code, or processor instructions that are executed to carry out the activities described herein.
In an example implementation, network elements of communication system 100, such as the nodes (e.g., nodes 104a,a-104a,o) in each group of nodes 104a-104e and switches 112a-112e may include software modules (e.g., data engine 120) to achieve, or to foster, operations as outlined herein. These modules may be suitably combined in any appropriate manner, which may be based on particular configuration and/or provisioning needs. In some embodiments, such operations may be carried out by hardware, implemented externally to these elements, or included in some other network device to achieve the intended functionality. Furthermore, the modules can be implemented as software, hardware, firmware, or any suitable combination thereof. These elements may also include software (or reciprocating software) that can coordinate with other network elements in order to achieve the operations, as outlined herein.
Turning to
Turning to
Turning to
Turning to
Turning to
Turning to
Turning to
In another example, node 110a,a,a may be in communication with switch 112a using two or more node paths. In a specific example, node 110a,a,a can divide the data into four parts and send the first part to node 110b,a,a in group 102b using communication path 108a,a, the second part to node 110b,a,b in group 102b using communication path 108a,b, the third part to node 110b,a,c in group 102b using communication path 108a,c, and the fourth part to node 110b,a,d in group 102b using communication path 108a,d. Note that node 110a,a,a may send each part of the data to any node in group 102b that is coupled to node 110a,a,a by a switch (e.g., switch 112a) and that that data being sent may be divided into as many communication paths that are included in group path 108a. Once the nodes (e.g., nodes 110b,a,a-110b,a,d) receive the portion of data from node 110a,a,a, the nodes can communicate the received portion to the other nodes until each nodes has the complete data. Each node (e.g. nodes 110a,a-110a,o illustrated in
Turning to
Turning to
Turning to
Using this process, collection of nodes 104b,b will include the first portion of data, collection of nodes 104b,c will include the second portion of data, collection of nodes 104b,d will include the third portion of data, and collection of nodes 104b,e will include the fourth portion of data. To communicate the remaining portions of data to each of the collection of nodes, each node can send its portion of data to a different collection of nodes. For example, node 110b,a,a can send a first portion of the data to node 110b,c,a in collection of nodes 104b,c. Node 110b,a,b can send a second portion of the data to node 110b,d.h in collection of nodes 104b,c. Node 110b,a,c, can send a third portion of the data to node 110b,e,n in collection of nodes 104b,d. Node 110b,a,d can send a fourth portion of the data to node 110b,b,d in collection of nodes 104b,e. Now collection of nodes 104b,b, will include the first and fourth portion of data, collection of nodes 104b,c will include the first and second portion of data, collection of nodes 104b,d will include the second and third portion of data, and collection of nodes 104b,e will include the third and fourth portion of data. This process can be repeated until each collection of nodes has the full data. Once each collection of nodes has the full data, each collection of nodes and communication the full data to each node in the collection of nodes as illustrated in
Turning to
Turning to
Note that with the examples provided herein, interaction may be described in terms of two, three, or more network elements. However, this has been done for purposes of clarity and example only. In certain cases, it may be easier to describe one or more of the functionalities of a given set of flows by only referencing a limited number of network elements. It should be appreciated that communication system 100 and its teachings are readily scalable and can accommodate a large number of components, as well as more complicated/sophisticated arrangements and configurations. Accordingly, the examples provided should not limit the scope or inhibit the broad teachings of communication system 100 and as potentially applied to a myriad of other architectures. For the purposes of the present disclosure, the phrase “A and/or B” means (A), (B), or (A and B). For the purposes of the present disclosure, the phrase “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C).
Although the present disclosure has been described in detail with reference to particular arrangements and configurations, these example configurations and arrangements may be changed significantly without departing from the scope of the present disclosure. Moreover, certain components may be combined, separated, eliminated, or added based on particular needs and implementations. Additionally, although communication system 100 have been illustrated with reference to particular elements and operations that facilitate the communication process, these elements and operations may be replaced by any suitable architecture, protocols, and/or processes that achieve the intended functionality of communication system 100.
Numerous other changes, substitutions, variations, alterations, and modifications may be ascertained to one skilled in the art and it is intended that the present disclosure encompass all such changes, substitutions, variations, alterations, and modifications as falling within the scope of the appended claims. In order to assist the United States Patent and Trademark Office (USPTO) and, additionally, any readers of any patent issued on this application in interpreting the claims appended hereto, Applicant wishes to note that the Applicant: (a) does not intend any of the appended claims to invoke paragraph six (6) of 35 U.S.C. section 112 as it exists on the date of the filing hereof unless the words “means for” or “step for” are specifically used in the particular claims; and (b) does not intend, by any statement in the specification, to limit this disclosure in any way that is not otherwise reflected in the appended claims.
Example C1 is at least one machine readable storage medium having one or more instructions that when executed by at least one processor, cause the at least one processor to identify one or more processes on a node, where the node is part of a first collection of nodes, consolidate data from the one or more processes, communicate the consolidated data to a second node, where the second node is in the first collection of nodes, where the first collection of nodes is part of a first group of nodes, and communicate the consolidated data to a third node, where the third node is in a second collection of nodes, where the second collection of nodes is part of the first group of nodes.
In Example C2, the subject matter of Example C1 can optionally include where the instructions, when executed by the by at least one processor, further cause the at least one processor to communicate the consolidated data to a fourth node, where the fourth node is part of a third collection of nodes, where the third collection of nodes is in a second group of nodes.
In Example C3, the subject matter of any one of Examples C1-C2 can optionally include where the instructions, when executed by the by at least one processor, further cause the at least one processor to receive data related to a gather or a scatter process from the second node.
In Example C4, the subject matter of any one of Examples C1-C3 can optionally include where the received data related to the gather process is combined with the consolidated data before communicating the combined consolidated data to another node, in another collection of nodes, in another group of nodes.
In Example C5, the subject matter of any one of Examples C1-C4 can optionally include where the instructions, when executed by the by at least one processor, further cause the at least one processor to receive data related to the gather process from a fourth node, where the fourth nodes is part of a third collection of nodes, where the third collection of nodes is in a second group of nodes.
In Example C6, the subject matter of any one of Examples C1-05 can optionally include where the instructions, when executed by the by at least one processor, further cause the at least one processor to communicate the data to the second node using a switch, where each node in the first collection of nodes is in communication with the switch.
In Example C7, the subject matter of any one of Examples C1-C6 can optionally include where the consolidated data communicated is to a different group using a pipeline, where the pipeline includes more than one communication path and the consolidated data is divided into portions, where the number of portions is equal to the number of communication paths.
In Example C8 the subject matter of any one of Examples C1-C7 can optionally include where the node is part of an interconnected network.
In Example C9, the subject matter of any one of Examples C1-C8 can optionally include where the node is part of a multi-tiered dragonfly topology network.
In Example S1, a system can include a plurality of a group of nodes, wherein each group of nodes include a plurality of a collection of nodes, wherein each collection of nodes includes a plurality of nodes and at least one processor configured to identify one or more processes on a node, where the node is part of a first collection of nodes, consolidate data from the one or more processes, communicate the consolidated data to a second node, where the second node is in the first collection of nodes, where the collection of nodes is part of a first group of nodes, and communicate the consolidated data to a third node, where the third node is in a second collection of nodes, where the second collection of nodes is part of the first group of nodes.
In Example, S2, the subject matter of Example S1 can optionally include where the at least one process is further configured to receive data from the second node, where the data is related to a gather or a scatter process.
In Example S3, the subject matter of any one of Examples S1-S2 can optionally include where the received consolidated data related to the gather process is combined with the data before communicating the combined consolidated data to another node, in another collection of nodes, in another group of nodes.
In Example S4, the subject matter of any one of Examples S1-S3 can optionally include where the at least one process is further configured to receive data related to the gather process from a fourth node, where the fourth nodes is part of a third collection of nodes, where the third collection of nodes is in a second group of nodes included in the plurality of the first group of nodes.
In Example S5, the subject matter of any one of Examples S1-S4 can optionally include where the at least one process is further configured to communicate the data to the second node using a switch, where each node in the first collection of nodes is in communication with the switch.
In Example S6, the subject matter of any one of Examples S1-55 can optionally include where the consolidated data is communicated to a different group from the plurality of nodes using a pipeline, where the pipeline includes more than one communication path and the consolidated data is divided into portions, where the number of portions is equal to the number of communication paths.
Example A1 is an apparatus for providing a collective communication operation, the apparatus comprising at least one memory element, at least one processor coupled to the at least one memory element, one or more data engines that, when executed by the at least one processor, are configured to identify one or more processes on a node, where the node is part of a first collection of nodes, consolidate data from the one or more processes, communicate the consolidated data to a second node, where the second node is in the collection of nodes, where the collection of nodes is part of a first group of nodes, and communicate the consolidated data to a third node, where the third node is in a second collection of nodes, where the second collection of nodes is part of the first group of nodes.
In Example A2, the subject matter of Example A1 can optionally include where the one or more data engines that, when executed by the at least one processor, are further configured to communicate the consolidated data to a fourth node, where the fourth node is part of a third collection of nodes, where the third collection of nodes is in a second group of nodes.
In Example A3, the subject matter of any one of the Examples A1-A2 can optionally include where the one or more data engines that, when executed by the at least one processor, are further configured to receive data from the second node, where the data is related to a gather process or a scatter process.
In Example A4, the subject matter of any one of the Examples A1-A3 can optionally include where the received data related to the gather process is combined with the consolidated data before communicating the combined consolidated data to another node, in another collection of nodes, in another group of nodes.
In Example A5, the subject matter of any one of the Examples A1-A4 can optionally include where the consolidated data is communicated to a different group using a pipeline, where the pipeline includes more than one communication path and the consolidated data is divided into portions, where the number of portions is equal to the number of communication paths.
Example M1 is a method including identifying one or more collective communication processes on a node, where the node is part of a first collection of nodes, consolidating data from the one or more processes, communicating the consolidated data to a second node, where the second node is in the first collection of nodes, where the first collection of nodes is part of a first group of nodes, and communicating the consolidated data to a third node, where the third node is in a second collection of nodes, where the second collection of nodes is part of the first group of nodes.
In Example M2, the subject matter of Example M1 can optionally include communicating the consolidated data to a fourth node, where the fourth node is part of a third collection of nodes, where the third collection of nodes is in a second group of nodes.
In Example M3, the subject matter of any one of the Examples M1-M2 can optionally include receiving data from the second node, where the data is related to a gather process or a scatter process.
In Example M4, the subject matter of any one of the Examples M1-M3 can optionally include communicating the data to the second node using a switch, where each node in the consolidated collection of nodes is in communication with the switch.
In Example M5, the subject matter of any one of the Examples M1-M4 can optionally include where the consolidated data communicated to a different group using a pipeline, where the pipeline includes more than one communication path and the consolidated data is divided into portions, where the number of portions is equal to the number of communication paths.
Example AA1 is an apparatus including means for consolidating data from one or more processes on a node, wherein the node is part of a first collection of nodes, means for communicating the consolidated data to a second node, wherein the second node is in the first collection of nodes, wherein the first collection of nodes is part of a first group of a collection of nodes, and means for communicating the consolidated data to a third node, wherein the third node is in a second collection of nodes, wherein the second collection of nodes is part of the first group of the collection of nodes.
In Example AA2, the subject matter of Example AA1 can optionally include means for communicating the consolidated data to a fourth node, wherein the fourth node is part of a third collection of nodes, wherein the third collection of nodes is in a second group of a collection of nodes.
In Example AA3, the subject matter of any one of Examples AA1-AA2 can optionally include means for receiving data related to a gather process or a scatter process from the second node.
In Example AA4, the subject matter of any one of Examples AA1-AA3 can optionally include where the received data related to the gather process is combined with the consolidated data before communicating the combined consolidated data to another node, in another collection of nodes, in another group of nodes.
In Example AA5, the subject matter of any one of Examples AA1-AA4 can optionally include means for receiving data related to the gather process from a fourth node, wherein the fourth nodes is part of a third collection of nodes, wherein the third collection of nodes is in a second group of a collection of nodes.
In Example AA6, the subject matter of any one of Examples AA1-AA5 can optionally include means for communicating the consolidated data to the second node using a switch, wherein each node in the first collection of nodes is in communication with the switch.
In Example AA7, the subject matter of any one of Examples AA1-AA6 can optionally include where the consolidated data communicated is to a different group using a pipeline, where the pipeline includes more than one communication path and the consolidated data is divided into portions, where the number of portions is equal to the number of communication paths.
In Example AA8, the subject matter of any one of Examples AA1-AA7 can optionally include where the node is part of an interconnected network.
In Example AA9, the subject matter of any one of Examples AA1-AA8 can optionally include where the node is part of a multi-tiered dragonfly topology network.
Example X1 is a machine-readable storage medium including machine-readable instructions to implement a method or realize an apparatus as in any one of the Examples A1-A5, or M1-M5. Example Y1 is an apparatus comprising means for performing of any of the Example methods M1-M5. In Example Y2, the subject matter of Example Y1 can optionally include the means for performing the method comprising a processor and a memory. In Example Y3, the subject matter of Example Y2 can optionally include the memory comprising machine-readable instructions.