The present invention relates generally to the field of advanced networking architectures in parallel processing systems, large scale switching systems, and parallel storage network systems. More specifically, the present invention relates to an interlaced bypass torus (iBT) network that is built by systematically interlacing bypass links to torus or mesh networks.
Advanced networking architectures have helped enable parallel computers to constantly break the performance barriers and such progress has stimulated the parallel computing community's ambitions to invent more scalable interconnection networks to accommodate the ever-increasing demands of performance and functionalities by incorporating millions of powerful processor cores. A scalable interconnection network, of a fixed node degree, must satisfy most of the performance requirements including small diameter, large bisection width, topological simplicity, high-degree symmetry, design modularity, and engineering feasibility, as well as expandability. For example, a 3D torus network such as those in the IBM's Blue Gene and Cray's T3E with up to 20 thousand nodes and several small-scale hypercube networks satisfy several of the requirements. However, the network diameters grow too fast for a torus, a hypercube and many of their derivatives. This defect of rapidly growing diameters greatly limits the expandability of these networks. Mesh networks of fixed dimension provide an alternative with relatively low node-degree and low engineering complexity but with large network diameter and small overall bandwidth. Other efforts to increase bandwidth without increasing network diameters include that of the hybrid fat-tree, a low-cost, low-degree network with irregular node degree; however, it is susceptible to faulty links and to message contention towards roots. Other proposals have also been introduced, such as the incomplete torus and its derivatives that reduce node degree at the expense of losing symmetry and topological simplicity. Honeycomb mesh and torus networks received considerable early attention that faded quickly due to implementation obstacles, among other difficulties. Hexagonal networks introduced also boast a small diameter but carry a burden of a high node degree. Modifications of the traditional torus including the Packed Exponential Connections, Shifted Recursive Torus, TESH, and Recursive Diagonal Torus networks all build upon the simplicity of mesh and torus networks, achieving improved network properties with unfavorable expandability and network cost. However, these variants demonstrated that interlacing rings of various lengths to a torus network is a profitable practice for improving network performance without adding significant engineering complexity.
Motivated by this and several other needs for massive storage systems, we invent a class of iBT networks. The iBT network is constructed by interlacing bypass links evenly in a torus or mesh network. We preserve the simplicity of grid-like layout and improve the performance of the network with minimal number of bypass links. Our model allows generalization of the bypass construction of the base torus to arbitrary dimensions for much larger and scalable networks than 2D. This new network achieves a low network diameter, high bisection width, short node-to-node distances, quick collective operations including broadcast, and low engineering complexity in terms of network cost. Furthermore, the iBT network has much lower node degree and lower network cost than a hypercube of the similar network size does. To ensure network symmetry and modularity, we interlace rings into the torus network consistently.
The objective of the present invention is to provide a class of d-dimensional interlaced bypass torus (iBT) networks for the interconnection network of parallel processing systems and the interconnection network of massive parallel storage network systems where d≧2. The d-dimensional iBT network is generated from a general d-dimensional torus or mesh network by adding a pair of bypass links to each processing element (PE). The distinguished features of the iBT networks include: the iBT network can be extended to a general d-dimensional case (d≧2); only one pair of bypass links are added to each PE along a single dimension; the selection of bypass links lengths does not change as the system size increases.
The other objective is to provide a parallel processing system that used the iBT network as its interconnection network. The parallel processing system can be a parallel computer consisting of a plurality of processing elements or a parallel storage system consisting of a plurality of storage elements. In the interconnection network of parallel computers, each processing element is a processor, a processor core or an integrated compute node of the parallel processing system. Such element is to perform data processing and message switching with other elements. In the interconnection network of storage systems, each storage element is one or several disks of various types or any data storage device, and its network controller. Such element is to provide storage resources to the network for primary data store, mirror data store, backup data store and data access: read and write.
The present invention is to provide a class of interlaced bypass torus (iBT) networks and a parallel processing system by the novel iBT network, in order to improve the interconnection network design of parallel processing systems.
With the illustrative clarity of the
Definition of the iBT network: A general d-dimensional iBT network (d≧2) starts from a torus network of dimensions N1×N2× . . . ×Nd is denoted as iBT (N1×N2× . . . ×Nd; L=m; b=b1, b2, . . . , bk).
In this notation, b=b1, b2 . . . , bk is referred to as the bypass scheme and it is a strictly increasing vector, i.e., b1<b2< . . . <bk. L=m; b=b1, b2 . . . , bk means that we interlace bi-hop bypass rings (i=1, . . . , k) recursively into the first m dimensions (m<d). When k=1, b=(b1) is referred to as a uniform bypass scheme; otherwise, b=b1, b2 . . . , bk (k≧2) is referred to as a mixed bypass scheme.
In an iBT network, each coordinate x=(x1, x2, . . . , xd) represents a processing element (PE), where xjε[0, Nj−1] is an integer and jε[1, d]. APE is a unit that is able to perform data processing and message switching with other processing elements.
In this d-dimensional iBT network as defined in [0012], each PE interconnects 2d+2 other PE's so the node degree, defined as the number of links from a PE to its neighbors, is 2d+2 where 2d and 2 are from the torus and bypass links, respectively.
Definition of a torus neighbor of a PE x=(x1, x2, . . . , xd): Assume x=(x1, x2, . . . , xd) and y=(y1, y2, . . . , yd) be two PE's in the iBT network, a torus distance Dt(x, y) between these two PE's is defined as Dt(x,y)=Σj=1d min{|xj−yj|,Nj−|xj−yj|}.
There is a torus link that interconnects x and y if and only if Dt(x,y)=1; otherwise, there is no torus link that interconnects these two PE's. y is a torus neighbor of x if and only if there is a torus link that interconnects x and y. Therefore, the same as in a torus network, each PE in the iBT network has 2d torus neighbors by torus links.
Definition of a bypass neighbor of a PE x=(x1, x2, . . . , xd): To determine the one pair of bypass links for x=(x1, x2, . . . , xd), we introduce two terms: a node bypass dimension bd(x)ε[1, . . . , m] and a node bypass length bl(x)ε{b1, . . . , bk} which can be expressed as bd(x)=[s (mod m)]+1 and bl(x)=bh where
Thus, it indicates: two bl(x)-hop bypass links are added to x in each direction along the dimension bd(x). Here, [α (mod β)] means a modulus, on division of α by β, and └α┘ means the largest previous integer not greater than α.
A PE y is a bypass neighbor of x if and only if there is a bypass link that interconnects x and y.
Qualification conditions for a bypass scheme b=b1, b2 . . . , bk: iBT(N1×N2× . . . ×Nd; L=m; b=b1, b2 . . . , bk) is a qualified iBT network if and only if its configurations satisfy the following four conditions as in [0021] to [0024]:
Condition I: Nl≡0 (mod m) where lε[1, m];
Condition II: bi≡0 (mod mk) where iε[1, k];
Condition III: bh ≡0 (mod bi) where 1≦i≦h≦k;
Condition IV: [Nl (mod b1)]≡0 (mod mk) where lε[1, m].
Here Condition I ensures that bypass rings are interlaced in the first m directions. Conditions II and III ensure that a bypass link always interconnects a pair of PE's of both the same bypass dimension and the same bypass length. Condition IV ensures that all of the bypass links form rings in a mixed bypass scheme. Here α≡γ(mod β) means [α (mod β)]≡[γ (mod β3)].
The other definition of the iBT network is: A general d-dimensional iBT network (d≧2) starts from a mesh network of dimensions N1×N2× . . . ×Nd is denoted as iBT(N1×N2× . . . ×Nd; L=m; b=b1, b2 . . . , bk).
The notations L=m; b=b1, b2 . . . , bk as used in the definition of [0026] have the same meanings as defined in [0013].
Definition of a mesh neighbor of a PE x=(x1, x2, . . . , xd): Assume x=(x1, x2, . . . , xd) and y=(y1, y2, . . . , yd) be two PE's in this iBT network, a mesh distance Dm(x, y) between these two PE's is defined as Dm(x,y)=Σj=1d|xj−yj|.
There is a mesh link that interconnects x and y if and only if Dm(x,y)=1; otherwise, there is no mesh link that interconnects these two PE's. y is a mesh neighbor of x if and only if there is a mesh link that interconnects x and y. Therefore, the same as in a mesh network, each PE in the iBT network has, at most, 2d mesh neighbors by mesh links.
The definition of a bypass neighbor as in for the iBT network defined in [0026] is the same as in [0018] to [0019]. Qualification conditions for a bypass scheme defined in [0021] through [0024] still apply for the iBT network defined in [0026].
The only difference between the definition in [0012] and the definition in [0026] is different selections of base networks, specifically, the former iBT network in [0012] uses a torus as a base network and the latter one in [0026] uses a mesh as a base network.
Definition of a more general iBT network: a more general d-dimensional iBT network is denoted as: iBT(N1× . . . ×Nd; L=m; b1=b11, . . . , b1k
In this notation, bl=bl1, . . . , blk
the more general case becomes the case defined in [0012].
In this more general case, the base network of N1× . . . ×Nd can be either a torus or a mesh network.
In this more general case, if the base network is a torus network, each PE has 2d+2 neighbors where 2d torus and 2 bypass neighbors by torus and bypass links, respectively; if the base network is a mesh network, each PE has 2 bypass neighbors and, at most, 2d mesh neighbors.
In this more general case, the node bypass dimension is expressed as: bd(x)=[s(mod m)]+1 where s=Σl=1mxl and the node bypass length is expressed as: bl(x)=bh where
Thus, it indicates: two bl(x)-hop bypass links are added to x in each direction along the dimension bd(x).
Qualification conditions for the bypass scheme defined in [0032]:
iBT(N1× . . . ×Nd; L=m; b1=b11, . . . , b1k
Condition I: Nl≡0 (mod m) where lε[1, m];
Condition II: bli≡0 (mod mkl) where iε[1, kl];
Condition III: blh≡0 (mod bli) where 1≦i≦h≦kl;
Condition IV: [Nl(mod bl1)]≡0 (mod mkl) where lε[1, m].
An exemplary embodiment of 2D iBT(8×8; L=2; b=4), or equally iBT(8×8; L=2; b1=4, b2=4), is shown in
An exemplary embodiment of 2D iBT(8×8; L=2; b1=2, b2=4) is shown in
An exemplary embodiment of 3D iBT(30×30×36; L=3; b=6,12), or equally iBT(30×30×36; L=3; b1=6,12, b2=6,12, b3=6,12), is shown in
An exemplary embodiment of 3D iBT(9×9×9; L=3; b=3), or equally iBT(9×9×9; L=3; b1=3, b2=3, b3=3), is shown in
According to the present iBT networks as defined in [0012], [0026] and [0032], N processing elements are able to be integrated as a whole parallel processing system where N=Πj=1dNj. These processing elements are interconnected as in the iBT network. In other words, the interconnection network of the parallel processing system is built using the iBT network. Each processing element performs data processing and message switching with its torus (or mesh) and bypass neighbors. A processing element can be a processor core, a processor or an integrated compute node with a router or switch for message switching.
According to the present iBT network as defined in [0012], [0026] and [0032], N data storage elements are able to be integrated as a whole massive storage network system where N=Πj=1dNj. These storage elements are interconnected as in the iBT network. In other words, the interconnection network of massive storage network systems is built using the iBT network. Each storage element provides storage resources to the network for primary data store, mirror data store, backup data store and data access. A storage element can be one or several disks of various types, or any data storage device, and a network controller for providing data access: read and write.