The present invention relates to the field of networks. More particularly, this invention relates to reliability of networks.
An interconnect fabric provides for communication among a set of nodes in a network. Communications originate within the network at a source node and terminate at a terminal node. Thus, a wide variety of networks may be viewed as a set of source nodes that communicate with a set of terminal nodes via an interconnect fabric. For example, a storage area network may be arranged as a set of computers as source nodes which are connected to a set of storage devices as terminal nodes via an interconnect fabric that includes communication links and devices such as hubs, routers, switches, etc. Devices such as hubs, routers, switches, etc., are hereinafter referred to as interconnect devices. Depending on the circumstances, a node may assume the role of source node with respect to some communications and of terminal node for other communications.
The communication requirements of an interconnect fabric may be characterized in terms of a set of flow requirements. A typical set of flow requirements specifies the required communication bandwidth from each source node to each terminal node. The design of an interconnect fabric usually involves selecting the appropriate arrangement of physical communication links and interconnect devices and related components that will meet the flow requirements.
An interconnect fabric that meets the minimum flow requirements under ideal conditions will not necessarily meet the flow requirements under other conditions, such as in the event of a failure of a communication link, interconnect device or related component. Therefore, network designers typically address these reliability considerations by building in excess capacity or redundancy to help meet flow requirements under adverse conditions. Prior techniques are largely ad hoc and, thus, tend to be time-consuming, error-prone and may result in an over-provisioned interconnect fabric.
A technique is disclosed for providing reliability to an interconnect fabric for communication among a set of nodes. The technique may be used to efficiently and programmatically produce a cost-effective interconnect fabric having a degree of reliability over a range of design problems.
In one aspect, reliability is provided to an interconnect fabric for communication among a set of nodes. Ports associated with each node are partitioned into a first set of ports and a second set of ports. A primary interconnect fabric is formed among the first set of ports in response to a set of flow requirements. A backup interconnect fabric is formed among the second set of ports. The backup interconnect fabric carries a portion of communications carried by the primary fabric so as to protect against occurrence of a failure in the primary fabric.
In another aspect, reliability is provided to an interconnect fabric for communication among a set of nodes. One or more failure modes are identified in a primary interconnect fabric that carries communications among the set of nodes via a first set of ports of the nodes. A backup interconnect fabric is formed among a second set of ports of the nodes for carrying a portion of the communications of the primary fabric so as to protect against occurrence of any single one of the failure modes of the primary fabric.
In yet another aspect, reliability is provided to a design for an interconnect fabric for communication between a set of nodes. A set of design information includes a set of flow requirements for the interconnect fabric. A fabric design tool generates a primary design for the interconnect fabric among of first set of ports of the nodes. The primary design is in response to the flow requirements. The design tool also generates a backup design for the interconnect fabric among a second set of ports for the nodes. The backup design carries a portion of communications carried by the primary fabric so as to protect against occurrence of any single one of failure modes of the primary fabric.
The backup interconnect fabric may be formed by generating arrangements of flow sets in response to the flow requirements, determining feasibility of merging pairs of candidate flow sets and merging a pair of the flow sets. The feasibility of merging candidate flow sets may depend on a sum of flow requirements in the candidate flow sets that are interrupted by a single failure in the primary fabric or may depend on a highest sum of flow requirements in the candidate flow sets that are interrupted by different failures in the primary interconnect fabric.
The present invention is described with respect to particular exemplary embodiments thereof and reference is accordingly made to the drawings in which:
In a step 102, a set of nodes to be interconnected by an interconnect fabric, and flow requirements among the nodes, are determined. Table 1 shows an example set of flow requirements for an interconnect fabric under design.
The flow requirements in this example specify two source nodes (source nodes 40–42 in the figures below) and three terminal nodes (terminal nodes 50–54 in the figures below). If an interconnect fabric is to meet the flow requirements, it must contain communication paths between all pairs of the source and terminal nodes 40–42 and 50–54 having positive flow requirements and must have sufficient bandwidth to support all of the flow requirements simultaneously.
In one embodiment, the source nodes 40–42 are host computers and terminal nodes 50–54 are storage devices. Thus, the interconnect fabric under design may be storage area network.
The bandwidth values for flows a, b, c, d, e and f may be numbers expressed in units of megabits per second (Mb/s). For this example, assume that each of the flows a, b, c, d, e and f have a bandwidth requirement of 33 Mb/s.
In other embodiments, there may be multiple flow requirements between a given source and terminal node pair. In such embodiments, the cells of Table 1 would contain a list of two or more entries. And, depending on the circumstances, a node may assume the role of source node with respect to some communications and of terminal node for other communications.
At step 104, the ports of each node may be partitioned into sets. For example, the ports at each node may be divided into two sets. In one embodiment, a first set includes all of the ports for each node, save one, and a second set includes the remaining port not assigned to the first set. In other embodiments, the ports of each node could be further divided into an additional number of (k) sets. In which case, additional fabrics may used to interconnect the additional sets of ports to provide even greater redundancy and reliability.
In the example, a first set of ports includes one port of each of the nodes 40 and 42 and three ports of each of the nodes 50, 52 and 54. A second set of ports includes one port of each of the nodes 40, 42, 50, 52 and 54. The first set includes those ports to the left of a dotted line (shown in
In a step 106 (
The method 200 partitions the flow requirements of the interconnect fabric into flow sets and iteratively merges the flow sets while taking into account the feasibility and cost of implementing the interconnect fabric.
At step 202, an arrangement of flow sets in the interconnect fabric is determined in response to the set of flow requirements for the source and terminal nodes. In one embodiment, step 202 is performed by generating a flow set for each flow specified in the flow requirements for the interconnect fabric. Thus, each of flows a, b, c, d, e and f of the example is initially included in a corresponding flow set having one flow.
At step 204, port violations which are associated with the arrangement of flow sets among the first set of ports are determined. In the example, port violations are determined for the first set of ports for each source node 40–42 and each terminal node 50–54. In general, the number of port violations is equal to the sum, over all flow sets, of the number of required physical communication links to the node from that flow set, minus the number of available ports in the set of ports at the node. Each flow set may require one or more physical communication links to a given source or terminal node in the network.
In this example (
In other examples, the number of available ports in the first set for the source nodes 40–42 and the terminal nodes 50–54 may differ and the number of physical communication links required by a flow set on a given source or terminal node it connects to may exceed one.
At step 206 (
In the current state of the example interconnect fabric shown in
The candidate pairs of flow sets considered at step 206 must be feasible to merge. An example of a pair of flow sets that is not feasible to merge is a pair for which an interconnect device of sufficient bandwidth is not available. For example, a flow set having 60 units of bandwidth cannot be merged with a flow set having 50 units of bandwidth if the highest bandwidth interconnect device available has 100 units. Another example of a pair of flow sets that is not feasible to merge is a pair that would exceed the available ports on every available interconnect device of the resulting flow set. Candidate pairs that are not feasible to merge are bypassed at step 206 in favor of other candidate pairs.
If port violations still exist in the interconnect fabric after step 206, then another candidate pair of flow sets is selected and merged in a repeat of step 206. The method 200 loops through steps 204–206 in an iterative fashion until all port violations are eliminated or until no further merges are feasible.
In a next pass through the step 206, a pair of flow sets from among those having the flows d, e and f may be merged to alleviate the port violation of two at the source node 42. Thus, as shown in
At this point, the interconnect fabric has a port violation of one at each of the source nodes 40 and 42. Then, another pass through the step 206 may result in the selection and merger of the flow set corresponding to the interconnect device 160 and with the flow set including the flows a and b with the flow set including the flow c which alleviates the port violation of the source node 40. The merger of flows a, b and c by the device 160 is feasible insofar as the aggregate of these flows is 99 Mb/s, which is less than the maximum bandwidth for the device 160, which in the example is 100 Mb/s.
A further pass through the step 206 may result in the selection and merger of the flow set corresponding to the interconnect device 162 and with the flow set including the flows e and f with the flow set including the flow d which alleviates the port violation of the source node 42. The merger of flows d, e and f by the device 160 is feasible insofar as the aggregate of these flows is 99 Mb/s, which is less than the maximum bandwidth for the device 160, which in the example is 100 Mb/s.
Returning to the method 100 of
A method 300 illustrated in
While the complete failure of a source node or terminal node could be said to dominate other failures, the backup interconnect fabric is preferably primarily protective of communications between the nodes, rather than of the nodes themselves. Accordingly, such possible failures of source or terminal nodes may be disregarded for purposes of step 302.
Thus, in one embodiment, the dominating failures identified in the step 302 include interconnect devices in the primary fabric and links in the primary fabric that connect flows directly between source and terminal nodes (without the flows passing through any interconnect devices).
In the example, the interconnect devices 160 and 162 are identified in step 302 as dominant failure points. Because there are no links that connect flows directly between source and terminal nodes in the example, no such links are identified as dominant failures for the example.
In a step 304, each flow that is associated with each dominant failure may be identified. In other words, for each dominant failure identified in step 304, each flow that would be interrupted in the event of the dominant failure may be identified. The flows may also be grouped according to the dominant failure. Thus, for each dominant failure, the affected flows may be grouped together.
In the example, for the dominating failure of the interconnect device 160, the flows that would be interrupted include the flow of a, the flow of b and the flow of c. Thus, the group of flows associated with the failure of device 160 includes the flows a, b and c. For the dominant failure of the interconnect device 162, the flows that would be interrupted include the flow of d, the flow of e and the flow of f. Thus, the group of flows associated with the failure of the device 162 includes the flows d, e and f.
In a step 306, port violations which are associated with the arrangement of flow sets among the second set of ports are determined. The arrangement of flow sets may be determined from the step 202 (
In a step 308, feasibility of possible merges is determined. Recall that that primary fabric is designed to accommodate all of the flows simultaneously. The backup fabric, however, need only accommodate a portion of the flows at any one time. Feasibility of possible merges for the backup fabric in step 308 is determined based on these more limited considerations. In one embodiment, the backup fabric need only provide for flows that are interrupted by the occurrence in the primary fabric of a single dominant failure. Thus, where only one of two different flows would be interrupted during the occurrence of a single dominant failure, their bandwidth requirements need not be simultaneously met. Rather, the worst-case bandwidth requirement for merging the two flows in the backup fabric is the greater requirement of the two flows. For example, if one such flow requires 50 units of bandwidth and the other flow requires 60 units of bandwidth, the worst-case bandwidth requirement is 60 units of bandwidth. However, for pairs of flows that would both be interrupted by the occurrence of a dominant failure, their bandwidth requirements would need to be met simultaneously in order for the network to be able to withstand such a failure. For example, if one such flow requires 50 units of bandwidth and the other requires 60 units, then the aggregated bandwidth requirement to be met by the backup fabric in the event of the failure is 110 units of bandwidth.
Where multiple sets of flows would each be interrupted by the occurrence of different failures, then the worst-case is the greatest sum of flow requirements among the sets. For example, assume that two flows that require 50 and 60 units of bandwidth, respectively, would be interrupted by one failure. Assume also that three other flows each requiring 30 units of bandwidth each would be interrupted by a different failure. The sum of flow requirements for the first set is 110 (50 added to 60), whereas the sum of flow requirements for the second set is 90 (three times 30). Accordingly, the worst-case that needs to be considered for merging these flow sets is the highest sum of 110 units of bandwidth.
Initially, for the example backup fabric of
Then, in a step 310, at least one of the port violations is alleviated by merging a pair of the flow sets. Because the source nodes 40 and 42 have the worst port violations, a pair of flow sets at the node 40 may be merged first. For example, the flow sets having flows a and b may be merged by an interconnect device 164, as shown in
Referring to
Recall that for the primary fabric of
Thus, in the example, to determine the feasibility of merging the flow set having the flows a, b and c with the flow set having the flows d, e and f only the greater bandwidth of the two flow sets needs to be accommodated by an interconnect device. Because each of these flow sets has a bandwidth requirement of 99 Mb/s, the greater of the two is also 99 Mb/s. Because this requirement is less than the maximum bandwidth capacity of the available interconnect devices, this means that these two flow sets can be merged to alleviate the port violations remaining at the terminal nodes 50–54. This is shown in
The backup fabric of
Note that in
Under certain circumstances, a single-layer fabric may not eliminate all of the port violations. In which case, the methods 200 and 300, by themselves may not result in a fabric design in which there are no port violations. Thus, in one embodiment, the present invention may address remaining port violations by recursively generating one or more additional layers of interconnect fabric nodes. For port violations at source nodes, the problem (i.e. the current fabric configuration and the applicable design information) may be recast such that the device nodes are treated as the terminal nodes. Then, one or more additional layers of device nodes may be inserted between the source nodes and the device nodes to relieve the port violations at source nodes. This results in links between device nodes and, thus, increases the number of layers in the interconnect fabric. Similarly, for terminal port violations, the problem may be recast such that the device nodes are treated as the source nodes. Then, one or more additional layers of device nodes may be inserted in between the device nodes and the terminal nodes to relieve the terminal node port violations. This also results in links between the device nodes and, thus, increases the number of layers in the interconnect fabric. Such a technique is disclosed in co-pending U.S. application Ser. No. 10/027,564, entitled, “Designing Interconnect Fabrics,” and filed Dec. 19, 2001, the contents of which are hereby incorporated by reference and which is continuation-in-part of U.S. application Ser. No. 09/707,227, filed Nov. 16, 2000.
Both the primary interconnect fabric and the backup interconnect fabric are implemented together in the network.
The list of hosts and devices 410 may specify the hosts and devices which are to be interconnected by an interconnect fabric design 424.
The list of fabric node types 412 may specify available interconnect devices, such as hubs, routers, switches, etc.
The link type data 414 may specify a list of available communication links that may be employed in the interconnect fabric design 424 and any relevant constraints. There are numerous examples of available communication links including fiber optic links, fiber channel links, wire-based links, and links such as SCSI as well as wireless links.
The flow requirements data 416 may specify the desired flow requirements for the interconnect fabric design 422. The desired flow requirements may include bandwidth requirements for each pairing of the source and terminal nodes.
The port availability data 418 may specify the number of communication ports available on each source node and each terminal node and each available interconnect device.
The bandwidth data 420 may specify the bandwidth of each host and device port and each type of fabric node and link.
The cost data 422 may specify costs associated with the available communication links and interconnect devices that may be employed in the interconnect fabric design 424. The cost data 422 may also specify the costs of ports for source and terminal nodes and interconnect devices. Other relevant costs may also be indicated.
The interconnect fabric design 424 generated by the fabric design tool 100 includes a list of the physical communication links and interconnect devices and ports, etc. and may include cost data.
The foregoing detailed description of the present invention is provided for the purposes of illustration and is not intended to be exhaustive or to limit the invention to the precise embodiment disclosed. Accordingly, the scope of the present invention is defined by the appended claims.
This is a continuation-in-part of U.S. application Ser. No. 09/707,227, filed Nov. 6, 2000, the contents of which are hereby incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
4920487 | Baffes | Apr 1990 | A |
5107489 | Brown et al. | Apr 1992 | A |
5113496 | McCalley et al. | May 1992 | A |
5138657 | Colton et al. | Aug 1992 | A |
5245609 | Ofek et al. | Sep 1993 | A |
5307449 | Keiley et al. | Apr 1994 | A |
5329619 | Pagé et al. | Jul 1994 | A |
5426674 | Nemirovsky et al. | Jun 1995 | A |
5524212 | Somani et al. | Jun 1996 | A |
5581689 | Slominski et al. | Dec 1996 | A |
5598532 | Liron | Jan 1997 | A |
5634004 | Gopinath et al. | May 1997 | A |
5634011 | Auerbach et al. | May 1997 | A |
5649105 | Aldred et al. | Jul 1997 | A |
5651005 | Kwok et al. | Jul 1997 | A |
5793362 | Matthews et al. | Aug 1998 | A |
5805578 | Stirpe et al. | Sep 1998 | A |
5815402 | Taylor et al. | Sep 1998 | A |
5831996 | Abramovici et al. | Nov 1998 | A |
5835498 | Kim et al. | Nov 1998 | A |
5838919 | Schwaller et al. | Nov 1998 | A |
5857180 | Hallmark et al. | Jan 1999 | A |
5878232 | Marimuthu | Mar 1999 | A |
5970232 | Passint et al. | Oct 1999 | A |
5987517 | Firth et al. | Nov 1999 | A |
6003037 | Kassabgi et al. | Dec 1999 | A |
6031984 | Walser | Feb 2000 | A |
6038219 | Mawhinney et al. | Mar 2000 | A |
6047199 | DeMarco | Apr 2000 | A |
6052360 | Rogers | Apr 2000 | A |
6108782 | Fletcher et al. | Aug 2000 | A |
6141355 | Palmer et al. | Oct 2000 | A |
6148000 | Feldman et al. | Nov 2000 | A |
6157645 | Shobatake | Dec 2000 | A |
6195355 | Demizu | Feb 2001 | B1 |
6212568 | Miller et al. | Apr 2001 | B1 |
6253339 | Tse et al. | Jun 2001 | B1 |
6331905 | Ellinas et al. | Dec 2001 | B1 |
6345048 | Allen et al. | Feb 2002 | B1 |
6363334 | Andrews et al. | Mar 2002 | B1 |
6418481 | Mancusi et al. | Jul 2002 | B1 |
6442584 | Kolli et al. | Aug 2002 | B1 |
6452924 | Golden et al. | Sep 2002 | B1 |
6526420 | Borowsky et al. | Feb 2003 | B2 |
6539027 | Cambron | Mar 2003 | B1 |
6539531 | Miller et al. | Mar 2003 | B2 |
6557169 | Erpeldinger | Apr 2003 | B1 |
6570850 | Gutierrez et al. | May 2003 | B1 |
6594701 | Forin | Jul 2003 | B1 |
6598080 | Nagami et al. | Jul 2003 | B1 |
6603769 | Thubert et al. | Aug 2003 | B1 |
6611872 | McCanne | Aug 2003 | B1 |
6614796 | Black et al. | Sep 2003 | B1 |
6625777 | Levin et al. | Sep 2003 | B1 |
6628649 | Raj et al. | Sep 2003 | B1 |
6633909 | Barrett et al. | Oct 2003 | B1 |
6650639 | Doherty et al. | Nov 2003 | B2 |
6668308 | Barroso et al. | Dec 2003 | B2 |
6675328 | Krishnarnachari et al. | Jan 2004 | B1 |
6687222 | Albert et al. | Feb 2004 | B1 |
6694361 | Shah et al. | Feb 2004 | B1 |
6697334 | Klincewicz et al. | Feb 2004 | B1 |
6697369 | Dziong et al. | Feb 2004 | B1 |
6697854 | Glassen et al. | Feb 2004 | B1 |
6701327 | Jones et al. | Mar 2004 | B1 |
6724757 | Zadikian et al. | Apr 2004 | B1 |
6744767 | Chiu et al. | Jun 2004 | B1 |
6757731 | Barnes et al. | Jun 2004 | B1 |
6766381 | Barker et al. | Jul 2004 | B1 |
6778496 | Meempat et al. | Aug 2004 | B1 |
6804245 | Mitchem et al. | Oct 2004 | B2 |
6857027 | Lindeborg et al. | Feb 2005 | B1 |
20020083159 | Ward et al. | Jun 2002 | A1 |
20020091845 | Ward et al. | Jul 2002 | A1 |
20020120770 | Parham et al. | Aug 2002 | A1 |
20020156828 | Ishizaki et al. | Oct 2002 | A1 |
20020188732 | Buckman et al. | Dec 2002 | A1 |
20030065758 | O'Sullivan et al. | Apr 2003 | A1 |
20030144822 | Peh et al. | Jul 2003 | A1 |
20030145294 | Ward et al. | Jul 2003 | A1 |
20050021583 | Andrzejak et al. | Jan 2005 | A1 |
20050021831 | Andrzejak et al. | Jan 2005 | A1 |
20050033844 | Andrzejak et al. | Feb 2005 | A1 |
Number | Date | Country |
---|---|---|
WO-9617458 | Jun 1996 | WO |
Number | Date | Country | |
---|---|---|---|
20020091804 A1 | Jul 2002 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09707227 | Nov 2000 | US |
Child | 10052682 | US |