A device connected to a network, e.g., an Ethernet network, typically connects to a port on a network switch or hub. Network switches and hubs have a limited number of ports. Expanding the network to include a number of devices beyond the number of ports typically requires linking two or more switches or hubs. Redundant paths in the network are typically disabled by network protocol to prevent broadcast storms and loops in the topology. Making efficient use of such multiple-switch networks is a challenge.
In general, in one aspect, the invention features a method. A full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN.
Implementations of the invention may include one or more of the following. The full bisection bandwidth network may include a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths. The method may further include aggregating Path A and Path B into a single trunk group so that Path A and Path B are active. The method may further include constructing the full bisection bandwidth network to have a fat tree topology. The method may further include constructing the full bisection bandwidth network to have a fully connected mesh topology. The plurality of nodes may include a root layer of N Ethernet switches and a branch layer of M Ethernet switches. M may be greater than N. The plurality of paths may include a path from each root layer switch to each branch layer switch. Assigning paths to the VLANs may include assigning paths from a first root layer switch to a first set of VLANs, assigning paths from a second root layer switch to a second set of VLANs, the first set of VLANs not containing any VLANS belonging to the second set of VLANS, and the second set of VLANs not containing any VLANs belonging to the first set of VLANs. M may equal 2N. Assigning paths to the VLANs may include assigning a path from branch layer switch BLS1 to root layer switch RLS1 to a first VLAN and assigning a path from branch layer switch BLS1 to root layer switch RLS2 to a second VLAN. Assigning paths to the VLANs may include assigning a path from branch layer switch BLS1 to root layer switch RLS1 to a first VLAN and assigning a path from branch layer switch BLS2 to root layer switch RLS1 to a second VLAN. Assigning paths to the VLANs may include providing a first path PATH1 from a first branch layer switch BLS1 to a first root layer switch RLS1, providing a second path PATH2 from the first branch layer switch BLS1 to the first root layer switch RLS1, and aggregating PATH1 and PATH2 into a single trunk group. A plurality of servers may be coupled to the full bisection bandwidth network. The method further may further include providing redundant paths from one of the plurality of servers to another of the plurality of servers. The method may further include providing redundant paths from each of the plurality of servers to the others of the plurality of servers. Assigning paths to the VLANs may include assigning a first path from branch layer switch BLS1 to root layer switch RLS1 to a first VLAN and assigning a second path redundant to the first path to the first VLAN. The plurality of paths may include a path from each root layer switch to each branch layer switch. A plurality of servers is coupled to the branch layer of Ethernet servers. Assigning paths to the VLANs may include assigning a first path from a first server to a second server and assigning a second path redundant to the first path from the first server to the second server. Assigning paths to the VLANs may include assigning redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
In general, in another aspect, the invention features a system. The system includes a full bisection bandwidth network. The full bisection bandwidth network includes a plurality of nodes. The full bisection bandwidth network includes a plurality of paths among the nodes. The full bisection bandwidth network includes a plurality of Virtual Local Area Networks (“VLANs”) incorporating the plurality of nodes and the plurality of paths. Each VLAN satisfies a spanning tree protocol. All paths are active in at least one VLAN.
Implementations of the invention include one or more of the following. The full bisection bandwidth network may include a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths. Path A and Path B may be aggregated into a single trunk group such that Path A and Path B are active. The full bisection bandwidth network may have a fat tree topology. The full bisection bandwidth network may have a fully connected mesh topology. The plurality of nodes may include a root layer of N Ethernet switches. The plurality of nodes may include a branch layer of M Ethernet switches. M may be greater than N. The plurality of paths among the nodes may include a path from each root layer switch to each branch layer switch. Paths from a first root layer switch may be assigned to a first set of VLANs. Paths from a second root layer switch may be assigned to a second set of VLANs. The first set of VLANs may not contain any VLANS belonging to the second set of VLANS. The second set of VLANs may not contain any VLANs belonging to the first set of VLANs. M may equal N. The plurality of paths among the nodes may include a path from branch layer switch BLS1 to root layer switch RLS1 assigned to a first VLAN and a path from branch layer switch BLS1 to root layer switch RLS2 assigned to a second VLAN. The plurality of paths among the nodes may include a path from branch layer switch BLS1 to root layer switch RLS1 assigned to a first VLAN and a path from branch layer switch BLS2 to root layer switch RLS1 assigned to a second VLAN. The plurality of paths among the nodes may include a first path PATH1 from a first branch layer switch BLS1 to a first root layer switch RLS1 and a second path PATH2 from the first branch layer switch BLS1 to the first root layer switch RLS1 that are aggregated into a single trunk group. The system may further include a plurality of servers coupled to the full bisection bandwidth network. he plurality of paths among the nodes may include a plurality of redundant paths from one of the plurality of servers to another of the plurality of servers. The plurality of paths among the nodes may include a plurality of redundant paths from each of the plurality of servers to the others of the plurality of servers. The plurality of paths among the nodes may include a first path from branch layer switch BLS1 to root layer switch RLS1 assigned to a first VLAN and a second path redundant to the first path assigned to the first VLAN. The plurality of paths among the nodes may include a first path from a first server to a second server and a second path redundant to the first path from the first server to the second server. The plurality of paths among the nodes may include redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
In general, in another aspect, the invention features a method. The method includes providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN. The full bisection bandwidth network carries a traffic load. The method includes balancing the traffic load among the paths.
In general, in another aspect, the invention features a method. The method includes providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN. The method further includes adding a node. The method further includes adding paths to connect the added node to the full bisection bandwidth network and adjusting the assignments of paths and the added paths to VLANs such that each VLAN satisfies a spanning tree protocol, all paths are active in at least one VLAN, and the network remains a full bisection bandwidth network.
Implementations of the invention may include one or more of the following. Adjusting the assignments may include adding a new VLAN. Adjusting the assignments may include adding the added paths to the existing VLANs.
The full bisection bandwidth network technique disclosed herein has particular application, but is not limited, to large databases that might contain many millions or billions of records managed by a database system (“DBS”) 100, such as a Teradata Active Data Warehousing System available from the assignee hereof.
For the case in which one or more virtual processors are running on a single physical processor, the single physical processor swaps between the set of N virtual processors.
For the case in which N virtual processors are running on an M-processor subsystem, the subsystem's operating system schedules the N virtual processors to run on its set of M physical processors. If there are 4 virtual processors and 4 physical processors, then typically each virtual processor would run on its own physical processor. If there are 8 virtual processors and 4 physical processors, the operating system would schedule the 8 virtual processors against the 4 physical processors, in which case swapping of the virtual processors would occur.
Each of the processing modules 1101 . . . N manages a portion of a database that is stored in a corresponding one of the data-storage facilities 1201 . . . N. Each of the data-storage facilities 1201 . . . N includes one or more disk drives. The DBS may include multiple subsystems 1052 . . . N in addition to the illustrated subsystem 1051, connected by extending the network 115.
The system stores data in one or more tables in the data-storage facilities 1201 . . . N. The rows 1251 . . . 2 of the tables are stored across multiple data-storage facilities 1201 . . . N to ensure that the system workload is distributed evenly across the processing modules 1101 . . . N. A parsing engine 130 organizes the storage of data and the distribution of table rows 1251 . . . 2 among the processing modules 1101 . . . N. The parsing engine 130 also coordinates the retrieval of data from the data-storage facilities 1201 . . . N in response to queries received from a user at a mainframe 135 or a client computer 140. The DBS 100 usually receives queries and commands to build tables in a standard format, such as SQL.
In one implementation, the rows 1251 . . . Z are distributed across the data-storage facilities 1201 . . . N by the parsing engine 130 in accordance with their primary index. The primary index defines the columns of the rows that are used for calculating a hash value. The function that produces the hash value from the values in the columns specified by the primary index is called the hash function. Some portion, possibly the entirety, of the hash value is designated a “hash bucket”. The hash buckets are assigned to data-storage facilities 1201 . . . N and associated processing modules 1101 . . . N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed.
In addition to the physical division of storage among the storage facilities illustrated in
In one example system, the parsing engine 130 is made up of three components: a session control 200, a parser 205, and a dispatcher 210, as shown in
Once the session control 200 allows a session to begin, a user may submit a SQL query, which is routed to the parser 205. As illustrated in
The network 115 will continue to be described in the context of the system illustrated in
In one embodiment, the network 115 includes a network 405, such as that illustrated in
The network 405 illustrated in
A full bisection bandwidth network is realized when the network can be arbitrarily cut in half, such as by line 425, and the number of cut links is equal to the number of end points in each half. In
In a typical Ethernet configuration, the network 405 illustrated in
In the typical Ethernet network, one of the paths shown in
In one embodiment of a network 405, the IEEE 802.1q protocol is applied to divide a network subject to the spanning tree protocol into VLANs in order to keep all network paths active and available for traffic. For example, as shown in
In some embodiments, multiple point-to-point paths between nodes are used to achieve full bisection bandwidth. Typically, Ethernet protocol disables redundant links to prevent loops in the network. For example, consider the nodes 705 and 710 connected by redundant paths 715, 720, and 725 in
In one embodiment of a network 405, the IEEE 802.2ad protocol is applied, as shown in
The switches are configured using Multiple Spanning Tree Protocol (802.1q) and Link Aggregation Protocol (803.2ad) to provide a 60-port network with full bisection bandwidth. All paths in the network are active at all times to carry traffic and there are 6 distinct paths for each source end point to inject traffic into the network to reach any destination end point.
The network is configured using Multiple Spanning Tree Protocol (802.1q) and Link Aggregation Protocol (803.2ad) to provide a 92-port network with full section bandwidth. All links in the network are active at all time to carry traffic and there are 4 distinct paths for each source node to inject traffic into the network to reach any destination node. Two ports in each switch are dedicated for system management.
The scripts used to accomplish this configuration with the network 1405 shown in
The network in
Tables 1 and 2 contain statistics collected when the network 1405 is not configured as described above. That is, the statistics shown in Table 1 show the number of bytes transmitted through one of the end points and the number of dropped packets when the network is configured with a single VLAN and is allowed to self-configure what it considers to be its best topology:
Table 2 shows network statistics collected when the network is configured into a full bisection bandwidth topology, as described herein.
In order to achieve a topology with full bisection bandwidth and predictable routes between sources and destinations, the network is cabled with strict connectivity for full section bandwidth, and each switch element is configured to achieve a complete network. Each switch element is configured to meet Multiple Spanning Tree Protocol (802.1q) (“MSTP”) and Link Aggregation Protocol (803.2ad) (“LAG”) requirements. The configuration allows the MSTP and LAG protocols to automatically produce the desired network. The following are the principle configuration settings used to meet MSTP and LAG requirements:
The resulting configuration:
As illustrated in
Configuration of the branch layer switches, as illustrated in
Configuration of the root layer switches, as illustrated in
The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
Number | Name | Date | Kind |
---|---|---|---|
7606178 | Rahman et al. | Oct 2009 | B2 |
20040190454 | Higasiyama | Sep 2004 | A1 |
20090274153 | Kuo et al. | Nov 2009 | A1 |
20110302346 | Vahdat et al. | Dec 2011 | A1 |
Entry |
---|
Dell Inc., “Dell PowerConnect 6200 Systems CLI Reference Guide”, (Oct. 2006). |
Leiserson, Charles E., “Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing”, IEEE Transactions on Computers, vol. C-34, No. 10, Oct. 1985, (Oct. 1, 1985),892-901. |
Davis, David “Preventing network loops with Spanning-Tree Protocol (STP)”, http://www.petri.co.il/csc—preventing—network—loops—with—stp—8021d.htm, (Jan. 7, 2009). |
“IEEE 802.1Q”, http://en.wikipedia.org/wiki/IEEE 802.1Q. |