A SAN is used to interconnect nodes within a distributed computer system, such as a cluster. The SAN is a type of network that provides high bandwidth, low latency communication with a very low error rate. SANs often utilize fault-tolerant technology to assure high availability. The performance of a SAN resembles a memory subsystem more than a traditional local area network (LAN).
The preferred embodiments will be described as implemented in the ServerNet™ (ServerNet) architecture, manufactured by the assignee of the present invention, which is a layered transport protocol for a System Area Network (SAN). The ServerNet II™ protocol layers for an end node and for a routing node are illustrated in
Support for two ports enables a ServerNet™ SAN to be configured in both non-redundant and redundant (fault tolerant, or FT) SAN configurations as illustrated in FIG. 2 and FIG. 3. On a fault tolerant network, a port of each end node may be connected to each network to provide continued message communication in the event of failure of one of the SANs. In the fault tolerant SAN, nodes may be also ported into a single fabric or single ported end nodes may be grouped into pairs to provide duplex FT controllers. The fabric is the collection of routers, switches, connectors, and cables that connects the nodes in a network.
The SAN includes end nodes and routing nodes connected by physical links. End nodes generate and consume data packets. Routing nodes never generate or consume data packets but simply pass the packets along from the source end node to the destination end node.
Each node includes one more full duplex ports each connected to a physical link. A link layer protocol (LLP) manages the flow of status and packet data between ports on independent nodes.
The ServerNet™ SAN has the ability to perform system management from a single point anywhere in the SAN. SAN management performs many functions including collection of error information to isolate faults to the link or module where the faults occurred.
An “In Band Control” or IBC mechanism supports a low overhead way of performing SAN management functions. The term “in band” indicates that the network management control data travels over the existing SAN links—with no separate cable or LAN connection. In contrast to data packets, both routing nodes and end nodes generate and consume IBC packets. IBC packets are not routed like data packets, each IBC packet contains embedded source routing information. Each router or end node that receives the IBC packet forwards it to the next destination in source route list.
The ServerNet™ SAN includes a maintenance system having responsibility for system initialization, fault reporting, diagnostics, and environmental control. A pair of service processors (SPs) manage the maintenance system. The SPs function as ServerNet™ I/O controllers and communicate with each other only via the ServerNet™ SAN.
The maintenance system uses dual system-maintenance buses which form redundant trees, independent of normal system functional paths and provide a path of two industry standard interconnects. The maintenance system controls, initializes, tests, and monitors all ASIC operations and provides a means for ASIC initialization, SAN topology determination, and error reporting.
One function of the link layer protocol is to maintain a “this link bad” (TLB) protocol. TLB (this link bad) commands are used to communicate exception status between the nodes comprising a link. Status flags and other state information are used to communicate exception status to software either directly (far end nodes) or via the maintenance system (for routing nodes). A port transmits a TLB command when the link is alive and one of the exceptions listed in the table of
The receive FIFO overflow type of exception indicates that the receive FIFO was full when new packet data was received and that the new data has been discarded. The packet CRC type of exception indicates that a packet framed by a TPG command failed the CRC check. The packet length type of exception indicates that the receive link layer protocol logic has processed a packet with a data link greater than 1,796 data bytes. The packet framing type of exception indicates that a packet terminated by an IDLE command was received. The link alive type of exception indicates that the link alive status has changed state, e.g., from the status of link alive to link dead. The command type of exception indicates an illegal or unsupported command is received or that a TLB or OLB command is received. Note that one of the conditions that results in transmission of a TLB command is the receipt of a TLB command. Thus all link errors result in the transmission of TLB commands in both directions on a link.
The ServerNet™ SAN has been enhanced to improve performance. The original ServerNet™ SAN configuration is designated SNet I and the improved configuration is designated SNet II. Among the improvements implemented in the SNet II SAN is a higher transfer rate and different symbol encoding. To attach SNet I end nodes and routing nodes to serial cables, a special two-port router ASIC matches SNet I devices to SNet II devices. This two-port router will be referred to as a “link-extender”, as used herein, is only a convenient name and does not connote any limitations on the functioning of the device.
A typical connection utilizing link-extenders is depicted in FIG. 5. Node A is connected to first link extender x by link 1. The first link extender x is coupled: by link 2 to a second link extender y. The second link extender y is coupled by link 3 to the Node B. The link extenders normally operate without intervention of system error handling software. Accordingly, the system error handling software sees the connection as a single link and is not able to isolate faults to a particular physical link, e.g., link 1, 2, or 3 in FIG. 5.
According to one aspect of the invention, a link extender includes error propagation logic that detects an exception on a link connected to the link extender and transmits an Other Link Bad (OLB) command on the other link connected to the link extender.
According to another aspect of the invention, a fault occurring in a first, second or third link included in a connection between two nodes (A and B) is isolated utilizing a this link bad (TLB); other link bad (OLB) protocol. When the first link is bad, a TLB is received by node A and an OLB by node B. When the second link is bad, both nodes receive OLBs. When the third link is bad, node A receives an OLB and node B receives a TLB.
According to another aspect of the invention, the TLB and OLB exceptions are reported by the nodes so that the SAN manager can isolate the faults.
Other features and advantages of the invention will be apparent in view of the following detailed description and appended drawings.
The receive side of the local port 102 includes a receive media access layer (RxMAC) 110, a receive synchronization module (RxSFIFO) 112, a loop back prevention module (LBP) 114, and a receive link layer protocol module (RxLLP) 116. The transmit side of the local port 102 includes the transmit link level protocol module (TxLLP) 117 and is coupled to the receive side of the local port 102 by the little FIFO 106.
The receive side of the remote port 104 includes a receive media access layer (RxMAC) 120, a receive synchronization module (RxSFIFO) 122, a loop back prevention module (LBP) 124, and a receive link layer protocol module (RXLLP) 126. The transmit side of the remote port 104 includes a transmit link layer protocol module (TxLLP) 130, a transmit synchronization module (TxSFIFO) 132, and a transmit media access layer module (TxMAC) 134. The transmit side of the local port 102 is coupled to the receive side of the remote port 104 by the big FIFO 108. A big FIFO 108 is required because of the large latency of long serial links.
In a currently preferred embodiment, a GigaBlaze™ control register contains the necessary control and status to operate the LSI Logic GigaBlaze™ G10™ module which implements parallel/serial conversion.
As described above, one function of the link level protocol (LLP) is to detect the exceptions listed in FIG. 4 and communicate exception status between the nodes comprising a link utilizing the “this link bad” (TLB) command. A port must transmit a TLB command when all of the following conditions are true:
One of the events that triggers an exception is the receipt of a TLB command. Thus, as depicted in
In a preferred embodiment of the invention, “other link bad” (OLB) commands are used by link extender nodes to report link exceptions detected on one of their ports to the node at the opposite end of the link extender's other link.
A port of a link extender must transmit an OLB command when all of the following conditions are true:
The use of the OLB command is illustrated by FIG. 7. When the control logic 136 of a link extender detects an exception on a link, it transmits a TLB command back to the node on the opposite of the link and instructs the transmitter in its other port to transmit an OLB command.
Since receipt of an OLB or a TLB command is one of the link exception conditions, the OLB command is propagated in both directions until it arrives at an end or routing node. Since the end and routing nodes capture the first exception, software can identify a faulty link by examining exceptions captured in the end and/or routing nodes at each end of a connection.
The receive logic in each port of the link extender node monitors the receive link for protocol violations. When any exception is detected:
Steps 3 and 4 assure the link exception is reported to both ends of the connection and re-enables link exception reporting. The forwarded OLB allows fault isolation logic to identify which link in the connection detected the exception. Such fault isolation logic (according to an aspect of the invention) is depicted via the block diagram of a network 1100 in example FIG. 11.
Network 1100 includes: a node A (1102); a link-extension device X (1106); a first link coupling node A (1102) and link-extension device X (1106); a link-extension device Y (1110); a second link 1108 coupling link-extension device X (1106) and link-extension device Y (1110); a node B (1114); a third link (1112) coupling second link-extension device Y (1110) node B (1114); and fault-isolation logic 1116.
The faults are isolated using the exception reported by the nodes at the end of the locations. Operation of fault-isolation logic 1116 is described in terms of using relationships represented FIG. 8.
The table in
In
In each of cases 1, 3, 4, and 6 depicted in
The invention has now been described with reference to the preferred embodiments. Alternatives and substitutions will now be apparent to persons of skill in the art. In particular, the exceptions described above are by way example and not critical to practicing the invention. Accordingly, it is not intended to limit the invention except as provided by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5208810 | Park | May 1993 | A |
5268900 | Hluchyj et al. | Dec 1993 | A |
5357563 | Hamilton et al. | Oct 1994 | A |
5557751 | Banman et al. | Sep 1996 | A |
5574849 | Sonnier et al. | Nov 1996 | A |
5675579 | Watson et al. | Oct 1997 | A |
5682142 | Loosmore et al. | Oct 1997 | A |
5694121 | Krause et al. | Dec 1997 | A |
5710549 | Horst et al. | Jan 1998 | A |
5867501 | Horst et al. | Feb 1999 | A |
5898154 | Rosen | Apr 1999 | A |
5933625 | Sugiyama | Aug 1999 | A |
6122625 | Rosen | Sep 2000 | A |
6253027 | Weber et al. | Jun 2001 | B1 |
6377543 | Grover et al. | Apr 2002 | B1 |
6411598 | McGlade | Jun 2002 | B1 |
6456595 | Bartholomay et al. | Sep 2002 | B1 |
6487173 | Tatsumi | Nov 2002 | B2 |