The present invention relates to high performance computing systems in general.
In high performance computing (HPC) systems many applications are written in a manner that requires communication between the systems which perform portions of the work (termed herein “processes”).
Part of the communications includes collective operation such as (by way of non-limiting example) doing sum of multiple vectors (element-wise add operation) from multiple processes, one per process, and sending a copy of the resulting vector to all the participating processes; this operation is called “all-reduce”. Another non-limiting example would be sending the result to only one of the processes; this operation is called “reduce”.
In addition to the compute elements (that is, reduce and all-reduce involve mathematical operators) there are data movement commands such as (by way of non-limiting example) all2all, gather, all gather, gather v, all gather v, scatter, etc. Commands of this type are defined in the well-known Message Passing Interface specification, and other communication Application Programmer Interface definitions.
The present invention, in certain embodiments thereof, seeks to provide an improved system and method for high performance computing.
There is thus provided in accordance with an exemplary embodiment of the present invention a method including providing a SHARP tree including a plurality of data receiving processes and at least one aggregation node, designating a data movement command, providing a plurality of data input vectors to each of the plurality of data receiving processes, respectively, the plurality of data receiving processes each passing on the respective received data input vector to the at least one aggregation node, and the at least one aggregation node carrying out the data movement command on the received plurality of data input vectors.
Further in accordance with an exemplary embodiment of the present invention the data movement command includes one of the following: gather, all gather, gather v, and all gather v.
Still further in accordance with an exemplary embodiment of the present invention the at least one aggregation node produces an output vector.
Additionally in accordance with an exemplary embodiment of the present invention at least one of the plurality of data input vectors includes a sparse vector.
Moreover in accordance with an exemplary embodiment of the present invention the at least one aggregation node utilizes a SHARP protocol.
There is also provided in accordance with another exemplary embodiment of the present invention apparatus including a SHARP tree including a plurality of data receiving processes and at least one aggregation node, the SHARP tree being configured to perform the following: receiving a data movement command, receiving a plurality of data input vectors to each of the plurality of data receiving processes, respectively, the data receiving processes each passing on the respective received data input vector to the at least one aggregation node, and the at least one aggregation node carrying out the data movement command on the received plurality of data input vectors.
Further in accordance with an exemplary embodiment of the present invention the data movement command includes one of the following: gather, all gather, gather v, and all gather v.
Still further in accordance with an exemplary embodiment of the present invention the at least one aggregation node is configured to produce an output vector.
Additionally in accordance with an exemplary embodiment of the present invention at least one of the plurality of data input vectors includes a sparse vector.
Moreover in accordance with an exemplary embodiment of the present invention the at least one aggregation node utilizes a SHARP protocol.
The present invention will be understood and appreciated more fully from the following detailed description, taken in conjunction with the drawings in which:
By way of introduction (but without limiting the generality of the present application), the concept behind exemplary embodiments of the present invention is to use the SHARP protocol to accelerate (at least the operations) gather, all gather, gather V, and all gather V.
The SHARP algorithm/protocol is described in US Published Patent Application 2017/0063613 of Bloch et al, the disclosure of which is hereby incorporated herein by reference.
In certain exemplary embodiments of the present invention, computations which may be described herein as if they occur in a serial manner may be executed in such a way that computations already occur while data is still being received.
The inventors of the present invention believe that current methodology for addressing the above-mentioned scenarios is to use software algorithms in order to perform the operations mentioned above. The software algorithms involve a large amount of data transfer/movement. Significant overhead may also be generated on a CPU which manages the algorithms, since multiple packets, each with a small amount of data, are sent; sometimes identical such packets are sent multiple times to multiple destinations, creating additional packet/bandwidth overhead, including header overhead. In addition, such software algorithms may create large latency when a large number of processes are involved.
In exemplary embodiments of the present invention, one goal is to offload work from the management CPU/s by simplification of the process using the SHARP algorithm/protocol (referred to above and described in US Published Patent Application 2017/0063613 of Bloch et al, the disclosure of which has been incorporated herein by reference); latency as well as bandwidth consumption may also be reduced in such exemplary embodiments.
In exemplary embodiments of the present invention, the SHARP algorithm/protocol (referred to above) is used to implement at least: gather; gather v; all gather; and all gather v.
In general, in exemplary embodiments of the present invention, the gather operation is treated as a regular reduction operation by using a new data format supporting sparse representation. When using the sparse representation each process/aggregation point sends its data to the SHARP network, while allowing the aggregation to move forward; this is different from a regular aggregation operation which assumes that each one of the processes/aggregation points contribute the exact same vector size.
It is appreciated that, in certain exemplary embodiments, within the SHARP network, a sparse representation may be converted to a dense representation; in certain cases, a dense representation may be able to be processed with greater efficiency. It is also appreciated that, in such a case, both sparse and dense representations may co-exist simultaneously in different points within the SHARP network.
In exemplary embodiments of the present invention, the following protocol updates are made relative to the SHARP protocol (see US Published Patent Application 2017/0063613 of Bloch et al, the disclosure of which has been incorporated herein by reference):
The following is one possible non-limiting example of an appropriate data format useable in exemplary embodiments of the present invention:
Index data size in bytes data [ ] data [ ] represents a list of data elements, each of which can be byte/s or bit/s. A special index value is reserved to mark the end of the data.
In exemplary embodiments, the following is an example of SHARP protocol behavior for using sparse data: An aggregation point looks at the data that arrives. If there is a matching index the aggregation point will implement the operation that is mentioned in the operation header; as a consequence of the matching index, the aggregation point will add the single index data to the result vector (that is, will concatenate the single data to the result vector).
The following, in exemplary embodiments, are non-limiting implementations of various operations:
1. How to implement gather:
Each process sends its data with index=rank_id, data_size=data size contributed, the result vector will include data from all the processes because each index will be unique and aggregation nodes will concatenate all the received data.
In gather, each process sends data, and that data is (in the end of the gather) held by a single process.
2. How to implement all-gather:
Similar to gather but ask SHARP protocol to send the result to all the processes that contributed to this operation. In all-gather, each process sends data, and that data is sent to all other processes.
3. How to implement gather v:
Each process sends a variant data size, indicating the amount of data that it sent; the SHARP protocol generates a result that includes all the data with variant size. A rank id, identifying the sending process, may also be sent.
The addition of “v” indicates that each process may send a vector of any size which that process wishes to send; otherwise, gather v is similar to gather.
4. How to implement all-gather v:
Similar to gather v but ask SHARP protocol to send the result to all the processes that contributed to this operation. Similarly to gather v, in certain exemplary embodiments the amount of data that is sent, and/or a rank id, may be provide by processes which send data, although it is appreciated that including such information is generally optional
The addition of “v” indicates that each process may send a vector of any size which that process wishes to send; otherwise, all-gather v is similar to all-gather.
As indicated above, the data format is exemplary only, and is in no way meant to be limiting. Without limiting the generality of the foregoing, it is appreciated that certain optimizations, including compression of meta-data (index and data size), may be used.
In exemplary embodiments, the present invention utilizes the SHARP protocol all-reduce ability in which processes send data for reduction. However, the all gather operation differs from previously-used examples of the SHARP protocol all-reduce: in all-reduce each process sends a vector of size X and the result is also of size X (an element-wise operation is performed). In all gather or all gather v, each process j sends vector of size Xj where the result vector size is Sum(Xj). In order to support this operation, each process to sends its own data Xj and all of the data is gathered into a single large vector. Persons skilled in the art will appreciate how the same principles apply to the other operations described herein.
Reference is now made to
In
Assuming, as described above, that the sparse vector format includes indexes, the aggregation node will be able to generate an ordered vector as depicted in
The aggregation operation will be carried out (in a manner more complex than that shown for purposes of simplicity of illustration in
In the general case the aggregation node will not assume the indexes are consecutive. For example, in the particular non-limiting example of
Reference is now made to
In the system of
Reference is now made to
In
In
It is appreciated that software components of the present invention may, if desired, be implemented in ROM (read only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques. It is further appreciated that the software components may be instantiated, for example: as a computer program product or on a tangible medium. In some cases, it may be possible to instantiate the software components as a signal interpretable by an appropriate computer, although such an instantiation may be excluded in certain embodiments of the present invention.
It is appreciated that various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable subcombination.
It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the invention is defined by the appended claims and equivalents thereof:
The present application claims priority from U.S. Provisional Patent Application Ser. 62/807,266 of Levi et al, filed 19 Feb. 2019.
Number | Name | Date | Kind |
---|---|---|---|
4933969 | Marshall et al. | Jun 1990 | A |
5068877 | Near et al. | Nov 1991 | A |
5325500 | Bell et al. | Jun 1994 | A |
5353412 | Douglas et al. | Oct 1994 | A |
5404565 | Gould | Apr 1995 | A |
5606703 | Brady et al. | Feb 1997 | A |
5944779 | Blum | Aug 1999 | A |
6041049 | Brady | Mar 2000 | A |
6370502 | Wu | Apr 2002 | B1 |
6483804 | Muller et al. | Nov 2002 | B1 |
6507562 | Kadansky | Jan 2003 | B1 |
6728862 | Wilson | Apr 2004 | B1 |
6857004 | Howard et al. | Feb 2005 | B1 |
6937576 | Di Benedetto et al. | Aug 2005 | B1 |
7102998 | Golestani | Sep 2006 | B1 |
7124180 | Ranous | Oct 2006 | B1 |
7164422 | Wholey, III et al. | Jan 2007 | B1 |
7171484 | Krause et al. | Jan 2007 | B1 |
7313582 | Bhanot et al. | Dec 2007 | B2 |
7327693 | Rivers et al. | Feb 2008 | B1 |
7336646 | Muller | Feb 2008 | B2 |
7346698 | Hannaway | Mar 2008 | B2 |
7555549 | Campbell et al. | Jun 2009 | B1 |
7613774 | Caronni et al. | Nov 2009 | B1 |
7636424 | Halikhedkar et al. | Dec 2009 | B1 |
7636699 | Stanfill | Dec 2009 | B2 |
7738443 | Kumar | Jun 2010 | B2 |
8213315 | Crupnicoff et al. | Jul 2012 | B2 |
8380880 | Gulley et al. | Feb 2013 | B2 |
8510366 | Anderson et al. | Aug 2013 | B1 |
8738891 | Karandikar | May 2014 | B1 |
8761189 | Shachar et al. | Jun 2014 | B2 |
8768898 | Trimmer et al. | Jul 2014 | B1 |
8775698 | Archer et al. | Jul 2014 | B2 |
8811417 | Bloch et al. | Aug 2014 | B2 |
9110860 | Shahar | Aug 2015 | B2 |
9189447 | Faraj | Nov 2015 | B2 |
9294551 | Froese et al. | Mar 2016 | B1 |
9344490 | Bloch et al. | May 2016 | B2 |
9563426 | Bent et al. | Feb 2017 | B1 |
9626329 | Howard | Apr 2017 | B2 |
9756154 | Jiang | Sep 2017 | B1 |
10015106 | Florissi et al. | Jul 2018 | B1 |
10158702 | Bloch et al. | Dec 2018 | B2 |
10284383 | Bloch et al. | May 2019 | B2 |
10296351 | Kohn et al. | May 2019 | B1 |
10305980 | Gonzalez et al. | May 2019 | B1 |
10318306 | Kohn et al. | Jun 2019 | B1 |
10425350 | Florissi | Sep 2019 | B1 |
10521283 | Shuler et al. | Dec 2019 | B2 |
10541938 | Timmerman et al. | Jan 2020 | B1 |
10621489 | Appuswamy et al. | Apr 2020 | B2 |
20020010844 | Noel et al. | Jan 2002 | A1 |
20020035625 | Tanaka | Mar 2002 | A1 |
20020150094 | Cheng et al. | Oct 2002 | A1 |
20020150106 | Kagan et al. | Oct 2002 | A1 |
20020152315 | Kagan et al. | Oct 2002 | A1 |
20020152327 | Kagan et al. | Oct 2002 | A1 |
20020152328 | Kagan et al. | Oct 2002 | A1 |
20030018828 | Craddock et al. | Jan 2003 | A1 |
20030061417 | Craddock et al. | Mar 2003 | A1 |
20030065856 | Kagan et al. | Apr 2003 | A1 |
20040062258 | Grow et al. | Apr 2004 | A1 |
20040078493 | Blumrich et al. | Apr 2004 | A1 |
20040120331 | Rhine et al. | Jun 2004 | A1 |
20040123071 | Stefan et al. | Jun 2004 | A1 |
20040252685 | Kagan et al. | Dec 2004 | A1 |
20040260683 | Chan | Dec 2004 | A1 |
20050097300 | Gildea et al. | May 2005 | A1 |
20050122329 | Janus | Jun 2005 | A1 |
20050129039 | Biran et al. | Jun 2005 | A1 |
20050131865 | Jones et al. | Jun 2005 | A1 |
20050281287 | Ninomi et al. | Dec 2005 | A1 |
20060282838 | Gupta et al. | Dec 2006 | A1 |
20070127396 | Jain et al. | Jun 2007 | A1 |
20070162236 | Lamblin | Jul 2007 | A1 |
20080104218 | Liang et al. | May 2008 | A1 |
20080126564 | Wilkinson | May 2008 | A1 |
20080168471 | Benner et al. | Jul 2008 | A1 |
20080181260 | Vonog et al. | Jul 2008 | A1 |
20080192750 | Ko et al. | Aug 2008 | A1 |
20080244220 | Lin et al. | Oct 2008 | A1 |
20080263329 | Archer et al. | Oct 2008 | A1 |
20080288949 | Bohra et al. | Nov 2008 | A1 |
20080298380 | Rittmeyer et al. | Dec 2008 | A1 |
20080307082 | Cai et al. | Dec 2008 | A1 |
20090037377 | Archer et al. | Feb 2009 | A1 |
20090063816 | Arimilli et al. | Mar 2009 | A1 |
20090063817 | Arimilli et al. | Mar 2009 | A1 |
20090063891 | Arimilli et al. | Mar 2009 | A1 |
20090182814 | Tapolcai et al. | Jul 2009 | A1 |
20090247241 | Gollnick et al. | Oct 2009 | A1 |
20090292905 | Faraj | Nov 2009 | A1 |
20100017420 | Archer et al. | Jan 2010 | A1 |
20100049836 | Kramer | Feb 2010 | A1 |
20100074098 | Zeng et al. | Mar 2010 | A1 |
20100095086 | Eichenberger et al. | Apr 2010 | A1 |
20100185719 | Howard | Jul 2010 | A1 |
20100241828 | Yu et al. | Sep 2010 | A1 |
20110060891 | Jia | Mar 2011 | A1 |
20110066649 | Berlyant et al. | Mar 2011 | A1 |
20110093258 | Xu | Apr 2011 | A1 |
20110119673 | Bloch et al. | May 2011 | A1 |
20110173413 | Chen et al. | Jul 2011 | A1 |
20110219208 | Asaad | Sep 2011 | A1 |
20110238956 | Arimilli et al. | Sep 2011 | A1 |
20110258245 | Blocksome et al. | Oct 2011 | A1 |
20110276789 | Chambers et al. | Nov 2011 | A1 |
20120063436 | Thubert et al. | Mar 2012 | A1 |
20120117331 | Krause et al. | May 2012 | A1 |
20120131309 | Johnson | May 2012 | A1 |
20120216021 | Archer et al. | Aug 2012 | A1 |
20120254110 | Takemoto | Oct 2012 | A1 |
20130117548 | Grover et al. | May 2013 | A1 |
20130159410 | Lee et al. | Jun 2013 | A1 |
20130318525 | Palanisamy et al. | Nov 2013 | A1 |
20130336292 | Kore et al. | Dec 2013 | A1 |
20140019574 | Cardona et al. | Jan 2014 | A1 |
20140033217 | Vajda | Jan 2014 | A1 |
20140047341 | Breternitz et al. | Feb 2014 | A1 |
20140095779 | Forsyth et al. | Apr 2014 | A1 |
20140122831 | Uliel et al. | May 2014 | A1 |
20140189308 | Hughes et al. | Jul 2014 | A1 |
20140211804 | Makikeni et al. | Jul 2014 | A1 |
20140258438 | Ayoub | Sep 2014 | A1 |
20140280420 | Khan | Sep 2014 | A1 |
20140281370 | Khan | Sep 2014 | A1 |
20140362692 | Wu et al. | Dec 2014 | A1 |
20140365548 | Mortensen | Dec 2014 | A1 |
20150106578 | Warfield et al. | Apr 2015 | A1 |
20150143076 | Khan | May 2015 | A1 |
20150143077 | Khan | May 2015 | A1 |
20150143078 | Khan et al. | May 2015 | A1 |
20150143079 | Khan | May 2015 | A1 |
20150143085 | Khan | May 2015 | A1 |
20150143086 | Khan | May 2015 | A1 |
20150154058 | Miwa et al. | Jun 2015 | A1 |
20150178211 | Hiramoto | Jun 2015 | A1 |
20150180785 | Annamraju | Jun 2015 | A1 |
20150188987 | Reed et al. | Jul 2015 | A1 |
20150193271 | Archer et al. | Jul 2015 | A1 |
20150212972 | Boettcher et al. | Jul 2015 | A1 |
20150269116 | Raikin et al. | Sep 2015 | A1 |
20150278347 | Meyer | Oct 2015 | A1 |
20150365494 | Cardona et al. | Dec 2015 | A1 |
20150379022 | Puig et al. | Dec 2015 | A1 |
20160055225 | Xu et al. | Feb 2016 | A1 |
20160092362 | Barron et al. | Mar 2016 | A1 |
20160105494 | Reed et al. | Apr 2016 | A1 |
20160112531 | Milton et al. | Apr 2016 | A1 |
20160117277 | Raindel et al. | Apr 2016 | A1 |
20160179537 | Kunzman | Jun 2016 | A1 |
20160219009 | French | Jul 2016 | A1 |
20160248656 | Anand et al. | Aug 2016 | A1 |
20160299872 | Vaidyanathan et al. | Oct 2016 | A1 |
20160342568 | Burchard et al. | Nov 2016 | A1 |
20160352598 | Reinhardt et al. | Dec 2016 | A1 |
20160364350 | Sanghi et al. | Dec 2016 | A1 |
20170063613 | Bloch | Mar 2017 | A1 |
20170093715 | McGhee et al. | Mar 2017 | A1 |
20170116154 | Palmer et al. | Apr 2017 | A1 |
20170187496 | Shalev et al. | Jun 2017 | A1 |
20170187589 | Pope et al. | Jun 2017 | A1 |
20170187629 | Shalev et al. | Jun 2017 | A1 |
20170187846 | Shalev et al. | Jun 2017 | A1 |
20170199844 | Burchard et al. | Jul 2017 | A1 |
20170262517 | Horowitz | Sep 2017 | A1 |
20170344589 | Kafai | Nov 2017 | A1 |
20180004530 | Vorbach | Jan 2018 | A1 |
20180046901 | Xie et al. | Feb 2018 | A1 |
20180047099 | Bonig et al. | Feb 2018 | A1 |
20180089278 | Bhattacharjee et al. | Mar 2018 | A1 |
20180091442 | Chen et al. | Mar 2018 | A1 |
20180097721 | Matsui et al. | Apr 2018 | A1 |
20180173673 | Daglis et al. | Jun 2018 | A1 |
20180262551 | Demeyer et al. | Sep 2018 | A1 |
20180285316 | Thorson et al. | Oct 2018 | A1 |
20180287928 | Levi et al. | Oct 2018 | A1 |
20180302324 | Kasuya | Oct 2018 | A1 |
20180321912 | Li et al. | Nov 2018 | A1 |
20180321938 | Boswell et al. | Nov 2018 | A1 |
20180349212 | Liu et al. | Dec 2018 | A1 |
20180367465 | Levi | Dec 2018 | A1 |
20180375781 | Chen et al. | Dec 2018 | A1 |
20190018805 | Benisty | Jan 2019 | A1 |
20190026250 | Das Sarma et al. | Jan 2019 | A1 |
20190044889 | Serres et al. | Feb 2019 | A1 |
20190065208 | Liu et al. | Feb 2019 | A1 |
20190068501 | Schneider et al. | Feb 2019 | A1 |
20190102179 | Fleming et al. | Apr 2019 | A1 |
20190102338 | Tang et al. | Apr 2019 | A1 |
20190102640 | Balasubramanian | Apr 2019 | A1 |
20190114533 | Ng et al. | Apr 2019 | A1 |
20190121388 | Knowles et al. | Apr 2019 | A1 |
20190138638 | Pal et al. | May 2019 | A1 |
20190147092 | Pal et al. | May 2019 | A1 |
20190149488 | Bansal et al. | May 2019 | A1 |
20190171612 | Shahar et al. | Jun 2019 | A1 |
20190235866 | Das Sarma et al. | Aug 2019 | A1 |
20190303168 | Fleming, Jr. et al. | Oct 2019 | A1 |
20190303263 | Fleming, Jr. et al. | Oct 2019 | A1 |
20190324431 | Cella et al. | Oct 2019 | A1 |
20190339688 | Cella et al. | Nov 2019 | A1 |
20190347099 | Eapen et al. | Nov 2019 | A1 |
20190369994 | Parandeh Afshar et al. | Dec 2019 | A1 |
20190377580 | Vorbach | Dec 2019 | A1 |
20190379714 | Levi et al. | Dec 2019 | A1 |
20200005859 | Chen et al. | Jan 2020 | A1 |
20200034145 | Bainville et al. | Jan 2020 | A1 |
20200057748 | Danilak | Feb 2020 | A1 |
20200103894 | Cella et al. | Apr 2020 | A1 |
20200106828 | Elias et al. | Apr 2020 | A1 |
20200137013 | Jin et al. | Apr 2020 | A1 |
20210203621 | Ylisirnio et al. | Jul 2021 | A1 |
Entry |
---|
Danalis et al., “PTG: an abstraction for unhindered parallelism”, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, pp. 1-10, Nov. 17, 2014. |
Cosnard et al., “Symbolic Scheduling of Parameterized Task Graphs on Parallel Machines,” Combinatorial Optimization book series (COOP, vol. 7), pp. 217-243, year 2000. |
Jeannot et al., “Automatic Multithreaded Parallel Program Generation for Message Passing Multiprocessors using paramerized Task Graphs”, World Scientific, pp. 1-8, Jul. 23, 2001. |
Stone, “An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations,” Journal of the Association for Computing Machinery, vol. 10, No. 1, pp. 27-38, Jan. 1973. |
Kogge et al., “A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations,” IEEE Transactions on Computers, vol. C-22, No. 8, pp. 786-793, Aug. 1973. |
Hoefler et al., “Message Progression in Parallel Computing—To Thread or not to Thread?”, 2008 IEEE International Conference on Cluster Computing, pp. 1-10, Tsukuba, Japan, Sep. 29-Oct. 1, 2008. |
U.S. Appl. No. 16/357,356 office action dated May 14, 2020. |
European Application # 20156490.3 search report dated Jun. 25, 2020. |
Bruck et al., “Efficient Algorithms for AII-to-AII Communications in Multiport Message-Passing Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 8, No. 11, pp. 1143-1156, Nov. 1997. |
Bruck et al., “Efficient Algorithms for AII-to-AII Communications in Multiport Message-Passing Systems”, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, pp. 298-309, Aug. 1, 1994. |
Chiang et al., “Toward supporting data parallel programming on clusters of symmetric multiprocessors”, Proceedings International Conference on Parallel and Distributed Systems, pp. 607-614, Dec. 14, 1998. |
Gainaru et al., “Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI AII-to-AII”, Proceedings of the 23rd European MPI Users' Group Meeting, pp. 167-179, Sep. 2016. |
Pjesivac-Grbovic et al., “Performance Analysis of MPI Collective Operations”, 19th IEEE International Parallel and Distributed Processing Symposium, pp. 1-19, 2015. |
Mellanox Technologies, “InfiniScale IV: 36-port 40GB/s Infiniband Switch Device”, pp. 1-2, year 2009. |
Mellanox Technologies Inc., “Scaling 10Gb/s Clustering at Wire-Speed”, pp. 1-8, year 2006. |
IEEE 802.1D Standard “IEEE Standard for Local and Metropolitan Area Networks—Media Access Control (MAC) Bridges”, IEEE Computer Society, pp. 1-281, Jun. 9, 2004. |
IEEE 802.1AX Standard “IEEE Standard for Local and Metropolitan Area Networks—Link Aggregation”, IEEE Computer Society, pp. 1-163, Nov. 3, 2008. |
Turner et al., “Multirate Clos Networks”, IEEE Communications Magazine, pp. 1-11, Oct. 2003. |
Thayer School of Engineering, “An Slightly Edited Local Elements of Lectures 4 and 5”, Dartmouth College, pp. 1-5, Jan. 15, 1998 http://people.seas.harvard.edu/˜jones/cscie129/nu_lectures/lecture11/switching/clos_network/clos_network.html. |
“MPI: A Message-Passing Interface Standard,” Message Passing Interface Forum, version 3.1, pp. 1-868, Jun. 4, 2015. |
Coti et al., “MPI Applications on Grids: a Topology Aware Approach,” Proceedings of the 15th International European Conference on Parallel and Distributed Computing (EuroPar'09), pp. 1-12, Aug. 2009. |
Petrini et al., “The Quadrics Network (QsNet): High-Performance Clustering Technology,” Proceedings of the 9th IEEE Symposium on Hot Interconnects (Hotl'01), pp. 1-6, Aug. 2001. |
Sancho et al., “Efficient Offloading of Collective Communications in Large-Scale Systems,” Proceedings of the 2007 IEEE International Conference on Cluster Computing, pp. 1-10, Sep. 17-20, 2007. |
Infiniband Trade Association, “InfiniBand™ Architecture Specification”, release 1.2.1, pp. 1-1727, Jan. 2008. |
InfiniBand Architecture Specification, vol. 1, Release 1.2.1, pp. 1-1727, Nov. 2007. |
Deming, “Infiniband Architectural Overview”, Storage Developer Conference, pp. 1-70, year 2013. |
Fugger et al., “Reconciling fault-tolerant distributed computing and systems-on-chip”, Distributed Computing, vol. 24, Issue 6, pp. 323-355, Jan. 2012. |
Wikipedia, “System on a chip”, pp. 1-4, Jul. 6, 2018. |
Villavieja et al., “On-chip Distributed Shared Memory”, Computer Architecture Department, pp. 1-10, Feb. 3, 2011. |
Ben-Moshe et al., U.S. Appl. No. 16/750,019, filed Jan. 23, 2020. |
Chapman et al., “Introducing OpenSHMEM: SHMEM for the PGAS Community,” Proceedings of the Forth Conferene on Partitioned Global Address Space Programming Model, pp. 1-4, Oct. 2010. |
Priest et al., “You've Got Mail (YGM): Building Missing Asynchronous Communication Primitives”, IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 221-230, year 2019. |
Wikipedia, “Nagle's algorithm”, pp. 1-4, Dec. 12, 2019. |
U.S. Appl. No. 16/430,457 Office Action dated Jul. 9, 2021. |
Yang et al., “SwitchAgg: A Further Step Toward In-Network Computing,” 2019 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, pp. 36-45, Dec. 2019. |
EP Application # 20216972 Search Report dated Jun. 11, 2021. |
U.S. Appl. No. 16/789,458 Office Action dated Jun. 10, 2021. |
U.S. Appl. No. 16/750,019 Office Action dated Jun. 15, 2021. |
U.S. Appl. No. 17/147,487 Office Action dated Jun. 30, 2022. |
U.S. Appl. No. 17/147,487 Office Action dated Nov. 29, 2022. |
U.S. Appl. No. 17/495,824 Office Action dated Jan. 27, 2023. |
Number | Date | Country | |
---|---|---|---|
20200265043 A1 | Aug 2020 | US |
Number | Date | Country | |
---|---|---|---|
62807266 | Feb 2019 | US |