System and method for supporting automatic disabling of degraded links in an infiniband (IB) network

Description

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

FIELD OF INVENTION

The present invention is generally related to computer systems, and is particularly related to supporting an InfiniBand (IB) network.

BACKGROUND

The interconnection network plays a beneficial role in the next generation of super computers, clusters, and data centers. High performance network technology, such as the InfiniBand (IB) technology, is replacing proprietary or low-performance solutions in the high performance computing domain, where high bandwidth and low latency are the key requirements. For example, IB installations are used in supercomputers such as Los Alamos National Laboratory's Roadrunner, Texas Advanced Computing Center's Ranger, and Forschungszcntrum Juelich's JuRoPa.

IB was first standardized in October 2000 as a merge of two older technologies called Future I/O and Next Generation I/O. Due to its low latency, high bandwidth, and efficient utilization of host-side processing resources, it has been gaining acceptance within the High Performance Computing (HPC) community as a solution to build large and scalable computer clusters. The de facto system software for IB is OpenFabrics Enterprise Distribution (OFED), which is developed by dedicated professionals and maintained by the OpenFabrics Alliance. OFED is open source and is available for both GNU/Linux and Microsoft Windows.

SUMMARY

Described herein is a system and method that can support automatic disabling of degraded links in an InfiniBand (IB) network. At least one node in a fabric can monitor one or more local ports of the at least one node for one or more error states associated with a link at the at least one node, wherein the link is connected to a local port of the at least one node. The at least one node further allows a subnet manager to observe the one or more error states associated with the link at the at least one node, and allows the subnet manager to set the link in a basic state if the observed error states exceed a threshold. In this basic state, the link allows only SMP traffic and prevents data traffic and non-SMP based management traffic.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows an illustration of a fabric model in a middleware environment in accordance with an embodiment of the invention.

FIG. 2 shows an illustration of supporting automatic disabling of degraded links in a middleware environment in accordance with an embodiment of the invention.

FIG. 3 illustrates an exemplary flow chart for alleviating network instability in a middleware environment in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

Described herein is a system and method that can support automatic disabling of degraded links in an interconnected network, such as an InfiniBand (IB) network.

FIG. 1 shows an illustration of a fabric model in a middleware environment in accordance with an embodiment of the invention. As shown in FIG. 1, an interconnected network, or a fabric 100, can include switches 101-103, bridges and routers 104, host channel adapters (HCAs) 105-106 and designated management hosts 107. Additionally, the fabric can include, or be connected to, one or more hosts 108 that are not designated management hosts.

The designated management hosts 107 can be installed with HCAs 105, 106, a network software stack and relevant management software in order to perform network management tasks. Furthermore, firmware and management software can be deployed on the switches 101-103, and the bridges and routers 104 to direct traffic flow in the fabric. Here, the host HCA drivers, OS and Hypervisors on hosts 108 that are not designated management hosts may be considered outside the scope of the fabric from a management perspective.

The fabric 100 can be in a single media type, e.g. an IB only fabric, and be fully connected. The physical connectivity in the fabric ensures in-band connectivity between any fabric components in the non-degraded scenarios. Alternatively, the fabric can be configured to include Ethernet (Enet) connectivity outside gateway (GW) external ports on a gateway 109. Additionally, it is also possible to have independent fabrics operating in parallel as part of a larger system. For example, the different fabrics can only be indirectly connected via different HCAs or HCA ports.

InfiniBand (IB) Architecture

IB architecture is a serial point-to-point technology. Each of the IB networks, or subnets, can include a set of hosts interconnected using switches and point-to-point links. A single subnet can be scalable to more than ten-thousand nodes and two or more subnets can be interconnected using an IB router. The hosts and switches within a subnet are addressed using local identifiers (LIDs), e.g. a single subnet may be limited to 49151 unicast addresses.

An IB subnet can employ at least one subnet manager (SM) which is responsible for initializing and starting up the sub-net including the configuration of all the IB ports residing on switches, routers and host channel adapters (HCAs) in the subset. The SM's responsibility also includes routing table calculation and deployment. Routing of the network aims at obtaining full connectivity, deadlock freedom, and load balancing between all source and destination pairs. Routing tables can be calculated at network initialization time and this process can be repeated whenever the topology changes in order to update the routing tables and ensure optimal performance.

At the time of initialization, the SM starts in the discovering phase where the SM does a sweep of the network in order to discover all switches and hosts. During the discovering phase, the SM may also discover any other SMs present and negotiate who should be the master SM. When the discovering phase is completed, the SM can enter a master phase. In the master phase, the SM proceeds with LID assignment, switch configuration, routing table calculations and deployment, and port configuration. At this point, the subnet is up and ready to use.

After the subnet is configured, the SM can monitor the network for changes (e.g. a link goes down, a device is added, or a link is removed). If a change is detected during the monitoring process, a message (e.g. a trap) can be forwarded to the SM and the SM can reconfigure the network. Part of the reconfiguration process, or a heavy sweep process, is the rerouting of the network which can be performed in order to guarantee full connectivity, deadlock freedom, and proper load balancing between all source and destination pairs.

The HCAs in an IB network can communicate with each other using queue pairs (QPs). A QP is created during the communication setup, and a set of initial attributes such as QP number, HCA port, destination LID, queue sizes, and transport service are supplied. On the other hand, the QP associated with the HCAs in a communication is destroyed when the communication is over. An HCA can handle many QPs, each QP consists of a pair of queues, a send queue (SQ) and a receive queue (RQ). There is one such pair present at each end-node that is participating in the communication. The send queue holds work requests to be transferred to the remote node, while the receive queue holds information on what to do with the data received from the remote node. In addition to the QPs, each HCA can have one or more completion queues (CQs) that are associated with a set of send and receive queues. The CQ holds completion notifications for the work requests posted to the send and receive queue.

The IB architecture is a flexible architecture. Configuring and maintaining an IB subnet can be carried out via special in-band subnet management packets (SMPs). The functionalities of a SM can, in principle, be implemented from any node in the IB subnet. Each end-port in the IB subnet can have an associated subnet management agent (SMA) that is responsible for handling SMP based request packets that are directed to it. In the IB architecture, a same port can represent a SM instance or other software component that uses SMP based communication. Thus, only a well defined sub-set of SMP operations can be handled by the SMA.

SMPs use dedicated packet buffer resources in the fabric, e.g. a special virtual lane (VL15) that is not flow-controlled (i.e. SMP packets may be dropped in the case of buffer overflow. Also, SMPs can use either the routing that the SM sets up based on end-port Local Identifiers (LIDs), or SMPs can use direct routes where the route is fully defined by the sender and embedded in the packet. Using direct routes, the packet's path goes through the fabric in terms of an ordered sequence of port numbers on HCAs and switches.

The SM can monitor the network for changes using SMAs that are presented in every switch and/or every HCA. The SMAs communicate changes, such as new connections, disconnections, and port state change, to the SM using traps and notices. A trap is a message sent to alert end-nodes about a certain event. A trap can contain a notice attribute with the details describing the event. Different traps can be defined for different events. In order to reduce the unnecessary distribution of traps, IB applies an event forwarding mechanism where end-nodes are required to explicitly subscribe to the traps they want to be informed about.

The subnet administrator (SA) is a subnet database associated with the master SM to store different information about a subnet. The communication with the SA can help the end-node to establish a QP by sending a general service management datagram (MAD) through a designated QP, .e.g. QP1. Both sender and receiver require information such as source/destination LIDs, service level (SL), maximum transmission unit (MTU), etc. to establish communication via a QP. This information can be retrieved from a data structure known as a path record that is provided by the SA. In order to obtain a path record, the end-node can perform a path record query to the SA, e.g. using the SubnAdmGet/SubnAdmGetable operation. Then, the SA can return the requested path records to the end-node.

The IB architecture provides partitions as a way to define which IB end-ports should be allowed to communicate with other IB end-ports. Partitioning is defined for all non-SMP packets on the IB fabric. The use of partitions other than the default partition is optional. The partition of a packet can be defined by a 16 bit P_Key that consists of a 15 bit partition number and a single bit member type (full or limited).

The partition membership of a host port, or an HCA port, can be based on the premise that the SM sets up the P_Key table of the port with P_Key values that corresponds to the current partition membership policy for that host. In order to compensate for the possibility that the host may not be fully trusted, the IB architecture also defines that switch ports can optionally be set up to do partition enforcement. Hence, the P_Key tables of switch ports that connect to host ports can then be set up to reflect the same partitions as the host port is supposed to be a member of (i.e. in essence equivalent to switch enforced VLAN control in Ethernet LANs).

Since the IB architecture allows full in-band configuration and maintenance of an IB subnet via SMPs, the SMPs themselves are not subject to any partition membership restrictions. Thus, in order to avoid the possibility that any rough or compromised node on the IB fabric is able to define an arbitrary fabric configuration (including partition membership), other protection mechanisms are needed.

M_Keys can be used as the basic protection/security mechanism in the IB architecture for SMP access. An M_Key is a 64 bit value that can be associated individually with each individual node in the IB subnet, and where incoming SMP operations may be accepted or rejected by the target node depending on whether the SMP includes the correct M_Key value (i.e. unlike P_Keys, the ability to specify the correct M_Key value—like a password—represents the access control).

By using an out-of-band method for defining M_Keys associated with switches, it is possible to ensure that no host node is able to set up any switch configuration, including partition membership for the local switch port. Thus, an M_Key value is defined when the switch IB links becomes operational. Hence, as long as the M_Key value is not compromised or “guessed” and the switch out-of-band access is secure and restricted to authorized fabric administrators, the fabric is secure.

Furthermore, the M_Key enforcement policy can be set up to allow read-only SMP access for all local state information except the current M_Key value. Thus, it is possible to protect the switch based fabric from un-authorized (re-)configuration, and still allow host based tools to perform discovery and diagnostic operations.

The flexibility provided by the IB architecture allows the administrators of IB fabrics/subnets, e.g. HPC clusters, to decide whether to use embedded SM instances on one or more switches in the fabric and/or set up one or more hosts on the IB fabric to perform the SM function. Also, since the wire protocol defined by the SMPs used by the SMs is available through APIs, different tools and are commands can be implemented based on use of such SMPs for discovery, diagnostics and control independently of any current Subnet Manager operation.

From a security perspective, the flexibility of IB architecture indicates that there is no fundamental difference between root access to the various hosts connected to the IB fabric and the root access allowing access to the IB fabric configuration. This is fine for systems that are physically secure and stable. However, this can be problematic for system configurations where different hosts on the IB fabric are controlled by different system administrators, and where such hosts should be logically isolated from each other on the IB fabric.

Automatic Disabling of Degraded Links

When IB links are disabled due to excessive error rates, it is difficult to observe the current error rates of the link, or to perform additional testing to further diagnose or characterize the problem. If a repair action is performed to correct the problem, e.g. by replacing a cable or correcting the seating of a cable connector, then the link needs to be enabled before it can be tested, in which case the subnet manager may also enable/use the link for normal data traffic.

Automated logic can be used to determine whether there are excessive error rates. This automated logic also can disable the link when there are excessive error rates. Since the link has been completely disabled, it is no longer possible to use the link for basic connectivity such as supporting management operations between IB nodes. However, the link may still be capable of being used for supporting management operations between IB nodes, even when a link is not reliable enough for normal data traffic. Additionally, there may be no corresponding automated operation to enable the link again as a result of detecting a significantly lower error rate over a significant period of time.

In accordance with an embodiment of the invention, the fabric can ensure that severely degraded links are not used for data traffic. The fabric allows for the definition of basic policies for automatic disabling of degraded links based on associated error rate thresholds. The fabric also allows for the specification of policies to automatically define degraded links that are in a non-routable state and to have the SM automatically observe this state.

Furthermore, subnet level error reporting can be supported in the fabric. The subnet level error reporting can be beneficial in terms of ensuring the states that the SM monitors are coherent, or consistent. The SM can be aware of the error state, and/or the explicit disabled state, that are associated with the links or with the pair of ports that each link represents.

Additionally, a local daemon can take the problem link out of normal service as soon as possible without permanently disabling it. For example, the local daemon can reset the link, instead of disabling the link, or marking the link as a bad link and leaving it up to the SM to take further action (i.e. potentially with longer reaction time). By resetting the link, the local daemon allows the normal data traffic to be disabled right away, and there may not be any delay for waiting for the SM to change the link state. Thus, resetting the link, rather than disabling the link, can bring the link to the same basic state as the SM can use initially. Furthermore, the SM can request the link to be enabled again and then keep the link in a basic state or at least not allow the link to be used for data traffic, even in the case where the local daemon disables the link. Here, the disabling of a link may remove the last link between the SM and the relevant node (where the daemon operates), in which case the SM may not be able to request for the enabling of the link.

FIG. 2 shows an illustration of supporting automatic disabling of degraded links in a middleware environment in accordance with an embodiment of the invention. As shown in FIG. 2, a fabric 200 includes a SM 202. At least one node 201 in the fabric 200 can have one or more ports 211-213, each of which is associated with one or more links 221-223. In the example as shown in FIG. 2, a link 222 associated with the port 212 on the node 201 is degraded.

A daemon 203 can be used on the node 201 to constantly monitor the symbol error and other error conditions associated with local links associated with the one or more ports 211-213. The node can perform a disable operation on the link if the various error rates exceed a configurable threshold during a configurable time interval. Such a disable operation can be reported to the SM 202, e.g. via conventional system management interfaces. A disable error state 209 can be explicitly recorded in a local stable storage 207, so that the link can avoid being unconditionally enabled again following a reset of the node 201. In one example, a configurable policy can be defined to remove the disable error state 209 following a reset of the node 201.

Instead of performing a local unconditional disable operation for the local port/link, the error state 209 can be recorded and made available to the SM 202 via subnet management traps, or be observable via SMP based methods, e.g. via extending the set of SMA attributes associated with the port. The SM 202 can observe the port error states using the designated SMP methods, e.g. using an extension to the normal subnet sweep operations. Also, the SM 202 can observe the port error state as a result of receiving the corresponding SMPs.

When the SM 202 detects a port 212 with an excessive error rate attribute set, the SM 202 can consider the corresponding link 222 as not operational in terms of being used by normal data traffic. The SM can still use this link for further discovery and other subnet management operations. However, the SM 202 is prevented from performing further discovery beyond the remote side of the relevant link in the case of a non-operational link, a not fully responsive remote SMA, or an unknown remote M_Key.

Furthermore, the SM 202 can leave/set the link 222 in a basic state, which allows SMP traffic and prevents both data traffic and non-SMP based management traffic. In this case, the link 222 can be tested using SMP based traffic in addition to be used for SMP based management operations. Testing can be initiated by the SM 202, or can be carried out by daemons 203 associated with the involved nodes, or by another centralized management entity.

Additionally, the SM 202 can enable the link 222 for normal data traffic for certain specific purposes. For example, non-SMP based traffic can be used to achieve higher levels of load/stress of the link for providing more elaborate stress testing of the link 222. Also, the link 222 can be used for other non-SMP based management traffic, such as the management traffic using general management packet (GMP) type MADs or higher level protocols like internet protocol over InfiniBand (IPoIB), in order to facilitate generic communication between any management entities associated with the various nodes. In these cases, the SM 202 may not include the link in the set of links through which normal data traffic is routed.

When the link is physically down and then comes back again, e.g. due to a cable replacement, the SM 202 can require a specific test procedure to be carried out for the link before the SM can fully include the link in the subnet. Such testing can be implemented by per node daemons 203 or by the SM 202 or by another centralized management entity. Then, the test procedure can be coordinated via additional SMP based methods and attributes associated with each port. When further testing shows a significantly improved error rate, the port attribute indicating excessive errors can be reset, and the SM 202 can then again include the link in the subnet topology for normal data traffic without any constraints.

FIG. 3 illustrates an exemplary flow chart for alleviating network instability in a middleware environment in accordance with an embodiment of the invention. As shown in FIG. 3, at step 301, at least one node in a fabric can monitor one or more local ports of the at least one node for one or more error states associated with a link at the at least one node, wherein the link is connected to a local port of the at least one node. Then, at step 302, a subnet manager can observe the one or more error states associated with the link at the at least one node. Finally, at step 303, the subnet manager can set the link to be in a basic state if the observed error states exceed a threshold.

The present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.

In some embodiments, the present invention includes a computer program product which is a storage medium or computer readable medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.

Claims

1. A method for supporting automatic disabling of degraded links in a fabric network that includes a subnet manager (SM) and a plurality of network nodes, the method comprising: monitoring, via a daemon process on a first network node in the fabric network, a local port of the first network node for one or more errors, wherein the one or more errors are associated with a link that connects the first network node and a second network node in the fabric network, wherein the link is in a first state and is configured to transfer both normal data and subnet management packet (SMP) data;in response to the one or more errors exceeding a configurable threshold, performing, by the daemon process, a disabling operation on the link to put the link into a second state, wherein the link in the second state is configured to transfer only SMP data;recording, by the daemon process, the state of the link in a persistent storage, wherein the recorded state is configured to prevent the link from being automatically enabled when the first network node is reset; andsending, via the link, one or more SMP messages to the subnet manager, wherein the one or more management data messages indicate that the link is disabled;wherein the subnet manager, upon receiving the one or more management data messages, operates to use the link to perform further discovery in the second network node,invoke the daemon process to initiate a test on the link,determine that errors at the local port of the first network node have dropped below the configurable threshold, andchange the link from the second state to the first state; andwherein the daemon process, in response to the state change of the link from the second state to the first state, operates to remove the recorded state from the persistence storage when the first network node is reset.
2. The method according to claim 1, further comprising: requesting, via the subnet manager, the disabled link to be enabled.
3. The method according to claim 1, further comprising: observing, via the subnet manager, the one or more errors using subnet management packet (SMP) methods.
4. The method according to claim 1, further comprising: including, via the subnet manager, the enabled link in a subnet topology.
5. The method according to claim 1, wherein the fabric network is an InfiniBand network that includes a plurality of subnets.
6. The method according to claim 1, wherein the fabric network specifies one or more policies for defining a link in a non-routable state for observing errors.
7. The method according to claim 1, where an automated logic is used to determine whether the one or more errors exceeds the configurable threshold.
8. A non-transitory machine readable storage medium having instructions stored thereon that when executed cause a system to perform the steps comprising: monitoring, via a daemon process on a first network node in the fabric network, a local port of the first network node for one or more errors, wherein the one or more errors are associated with a link that connects the first network node and a second network node in the fabric network, wherein the link is in a first state and is configured to transfer both normal data and subnet management packet (SMP) data;in response to the one or more errors exceeding a configurable threshold, performing, by the daemon process, a disabling operation on the link to put the link into a second state, wherein the link in the second state is configured to transfer only SMP data;recording, by the daemon process, the state of the link in a persistent storage, wherein the recorded state is configured to prevent the link from being automatically enabled when the first network node is reset; andsending, via the link, one or more SMP messages to the subnet manager, wherein the one or more management data messages indicate that the link is disabled;wherein the subnet manager, upon receiving the one or more management data messages, operates to use the link to perform further discovery in the second network node,invoke the daemon process to initiate a test on the link,determine that errors at the local port of the first network node have dropped below the configurable threshold, andchange the link from the second state to the first state; andwherein the daemon process, in response to the state change of the link from the second state to the first state, operates to remove the recorded state from the persistence storage when the first network node is reset.
9. The non-transitory machine readable storage medium according to claim 8, wherein the fabric network is an InfiniBand network that includes a plurality of subnets.
10. The non-transitory machine readable storage medium according to claim 8, wherein the network node operates to request the disabled link to be enabled.
11. The non-transitory machine readable storage medium according to claim 8, wherein the subnet manager observes the one or more errors using subnet management packet (SMP) methods.
12. The non-transitory machine readable storage medium according to claim 8, wherein the subnet manager operates to include the enabled link in a subnet topology.
13. The non-transitory machine readable storage medium according to claim 8, wherein the fabric network specifies one or more policies for defining a link in a non-routable state for observing errors.
14. A system for supporting automatic disabling of degraded links in a network, comprising: a computer one or more microprocessors;a fabric network executing on the computer, wherein the fabric network includes a first network node and a second network node, wherein the first network node includes a daemon process configured to monitor a local port of the network node for one or more errors associated with a link that connects the first network node and the second network node, wherein the link is in a first state and is configured to transfer both normal data and subnet management packet (SMP) data,perform a disabling operation on the link to put the link into a second state, in response to the one or more errors exceeding a configurable threshold, wherein the link in the second state is configured to transfer only SMP data, andrecord the state of the link in a persistent storage, wherein the recorded state is configured to prevent the link from being automatically enabled when the network node is reset; anda subnet manager in the fabric network, wherein the subnet manager operates to receive, via the link, one or more management data messages from the first network node, wherein the one or more management data messages indicate that the link is disabled;wherein the managing node, upon receiving the one or more management data messages, operates to use the link to perform further discovery in the second network node,invoke the daemon process to initiate a test on the link,determine that errors at the local port of the first network node have dropped below the configurable threshold, andchange the link from the second state to the first state; andwherein the daemon process, in response to the state change of the link from the second state to the first state, operates to remove the recorded state from the persistence storage when the first network node is reset.
15. The system according to claim 14, wherein the network node operates to request the disabled link to be enabled.
16. The system according to claim 14, wherein the subnet manager operates to include the enabled link in a subnet topology.
17. The system according to claim 14, wherein the fabric network is an InfiniBand network that includes a plurality of subnets.
18. The system according to claim 14, wherein the fabric network specifies one or more policies for defining a link in a non-routable state for observing errors.
19. The system according to claim 14, where an automated logic is used to determine whether the one or more errors exceeds the configurable threshold.

CLAIM OF PRIORITY

This application claims the benefit of priority on U.S. Provisional Patent Application No. 61/493,330, entitled “STATEFUL SUBNET MANAGER FAILOVER IN A MIDDLEWARE MACHINE ENVIRONMENT” filed Jun. 3, 2011, which application is herein incorporated by reference.

US Referenced Citations (177)

Number	Name	Date	Kind
5805805	Civanlar et al.	Sep 1998	A
5964837	Chao	Oct 1999	A
6014669	Slaughter	Jan 2000	A
6091706	Shaffer	Jul 2000	A
6202067	Blood	Mar 2001	B1
6463470	Mohaban	Oct 2002	B1
6594759	Wang	Jul 2003	B1
6647419	Mogul	Nov 2003	B1
6678835	Shah	Jan 2004	B1
6748429	Talluri et al.	Jun 2004	B1
6829685	Neal	Dec 2004	B2
6904545	Erimli	Jun 2005	B1
6941350	Frazier et al.	Sep 2005	B1
6963932	Bhat	Nov 2005	B2
6978300	Beukema et al.	Dec 2005	B1
6981025	Frazier et al.	Dec 2005	B1
6985956	Luke et al.	Jan 2006	B2
7023811	Pinto	Apr 2006	B2
7069468	Olson	Jun 2006	B1
7113995	Beukema	Sep 2006	B1
7185025	Rosenstock et al.	Feb 2007	B2
7194540	Aggarwal	Mar 2007	B2
7200704	Njoku et al.	Apr 2007	B2
7216163	Sinn	May 2007	B2
7221676	Green	May 2007	B2
7231518	Bakke	Jun 2007	B1
7290277	Chou et al.	Oct 2007	B1
7302484	Stapp	Nov 2007	B1
7356841	Wilson et al.	Apr 2008	B2
7398394	Johnsen et al.	Jul 2008	B1
7409432	Recio et al.	Aug 2008	B1
7437447	Brey et al.	Oct 2008	B2
7493409	Craddock et al.	Feb 2009	B2
7500236	Janzen	Mar 2009	B2
7633955	Saraiya et al.	Dec 2009	B1
7634608	Droux et al.	Dec 2009	B2
7636772	Kirby	Dec 2009	B1
7653668	Shelat	Jan 2010	B1
7685385	Choudhary et al.	Mar 2010	B1
7724748	Davis	May 2010	B2
7783788	Quinn et al.	Aug 2010	B1
7843822	Paul et al.	Nov 2010	B1
7853565	Liskov	Dec 2010	B1
7860961	Finkelstein et al.	Dec 2010	B1
7873711	Adams et al.	Jan 2011	B2
7953890	Katkar	May 2011	B1
8184555	Mouton	May 2012	B1
8214558	Sokolov	Jul 2012	B1
8214653	Marr	Jul 2012	B1
8234407	Sugumar	Jul 2012	B2
8291148	Shah et al.	Oct 2012	B1
8327437	McAlister	Dec 2012	B2
8331381	Brown et al.	Dec 2012	B2
8335915	Plotkin et al.	Dec 2012	B2
8423780	Plotkin et al.	Apr 2013	B2
8549281	Samovskiy et al.	Oct 2013	B2
8583921	Shu	Nov 2013	B1
8635318	Shankar	Jan 2014	B1
8769152	Gentieu	Jul 2014	B2
8924952	Hou	Dec 2014	B1
8935206	Aguilera	Jan 2015	B2
8935333	Beukema	Jan 2015	B2
8972966	Kelso	Mar 2015	B2
20020059597	Kikinis et al.	May 2002	A1
20020120720	Moir	Aug 2002	A1
20020143914	Cihula	Oct 2002	A1
20020188711	Meyer	Dec 2002	A1
20020198755	Birkner	Dec 2002	A1
20030009487	Prabakaran et al.	Jan 2003	A1
20030009551	Benfield	Jan 2003	A1
20030033427	Brahmaroutu	Feb 2003	A1
20030065775	Aggarwal	Apr 2003	A1
20030093509	Li et al.	May 2003	A1
20030105903	Garnett et al.	Jun 2003	A1
20030115276	Flaherty	Jun 2003	A1
20030120852	McConnell et al.	Jun 2003	A1
20030208572	Shah et al.	Nov 2003	A1
20040022245	Forbes	Feb 2004	A1
20040031052	Wannamaker	Feb 2004	A1
20040068501	McGoveran	Apr 2004	A1
20040090925	Schoeberl	May 2004	A1
20040139083	Hahn	Jul 2004	A1
20040153849	Tucker et al.	Aug 2004	A1
20040162973	Rothman	Aug 2004	A1
20040193768	Carnevale	Sep 2004	A1
20040199764	Koechling et al.	Oct 2004	A1
20040220947	Aman	Nov 2004	A1
20040249928	Jacobs et al.	Dec 2004	A1
20040255286	Rothman	Dec 2004	A1
20050025520	Murakami	Feb 2005	A1
20050044363	Zimmer et al.	Feb 2005	A1
20050071382	Rosenstock	Mar 2005	A1
20050071709	Rosenstock et al.	Mar 2005	A1
20050086342	Burt	Apr 2005	A1
20050091396	Nilakantan et al.	Apr 2005	A1
20050105554	Kagan et al.	May 2005	A1
20050125520	Hanson	Jun 2005	A1
20050182701	Cheston	Aug 2005	A1
20050182831	Uchida	Aug 2005	A1
20050182853	Lewites et al.	Aug 2005	A1
20050198164	Moore et al.	Sep 2005	A1
20050198250	Wang	Sep 2005	A1
20050213608	Modi	Sep 2005	A1
20050273641	Sandven et al.	Dec 2005	A1
20060079278	Ferguson et al.	Apr 2006	A1
20060112297	Davidson	May 2006	A1
20060114863	Sanzgiri	Jun 2006	A1
20060117103	Brey	Jun 2006	A1
20060136735	Plotkin	Jun 2006	A1
20060168192	Sharma	Jul 2006	A1
20060177103	Hildreth	Aug 2006	A1
20060195560	Newport	Aug 2006	A1
20060221975	Lo	Oct 2006	A1
20070016694	Achler	Jan 2007	A1
20070050763	Kagan	Mar 2007	A1
20070110245	Sood	May 2007	A1
20070129917	Blevins	Jun 2007	A1
20070195774	Sherman	Aug 2007	A1
20070195794	Fujita	Aug 2007	A1
20070206735	Silver et al.	Sep 2007	A1
20070253328	Harper	Nov 2007	A1
20080031266	Tallet et al.	Feb 2008	A1
20080144614	Fisher et al.	Jun 2008	A1
20080183853	Manion et al.	Jul 2008	A1
20080184332	Gerkis	Jul 2008	A1
20080192750	Ko	Aug 2008	A1
20080201486	Hsu	Aug 2008	A1
20080209018	Hernandez	Aug 2008	A1
20080229096	Alroy et al.	Sep 2008	A1
20080250125	Brey et al.	Oct 2008	A1
20080288646	Hasha	Nov 2008	A1
20080310421	Teisberg	Dec 2008	A1
20080310422	Booth	Dec 2008	A1
20090049164	Mizuno	Feb 2009	A1
20090116404	Mahop	May 2009	A1
20090141728	Brown	Jun 2009	A1
20090178033	Challener	Jul 2009	A1
20090216853	Burrow	Aug 2009	A1
20090249472	Litvin	Oct 2009	A1
20090271472	Scheifler	Oct 2009	A1
20090307499	Senda	Dec 2009	A1
20090327462	Adams et al.	Dec 2009	A1
20100014526	Chavan	Jan 2010	A1
20100020806	Vandat	Jan 2010	A1
20100080117	Coronado et al.	Apr 2010	A1
20100082853	Block	Apr 2010	A1
20100114826	Voutilainen	May 2010	A1
20100138532	Glaeser et al.	Jun 2010	A1
20100142544	Chapel et al.	Jun 2010	A1
20100166167	Karimi-Cherkandi et al.	Jul 2010	A1
20100235488	Sharma	Sep 2010	A1
20100268857	Bauman et al.	Oct 2010	A1
20100306772	Arnold et al.	Dec 2010	A1
20110022574	Hansen	Jan 2011	A1
20110072206	Ross et al.	Mar 2011	A1
20110110366	Moore et al.	May 2011	A1
20110138082	Khatri	Jun 2011	A1
20110138185	Ju	Jun 2011	A1
20110173302	Rider	Jul 2011	A1
20110179195	O'Mullan	Jul 2011	A1
20110209202	Otranen	Aug 2011	A1
20110222492	Borsella et al.	Sep 2011	A1
20110283017	Alkhatib	Nov 2011	A1
20110307886	Thanga	Dec 2011	A1
20120005480	Batke	Jan 2012	A1
20120039331	Astigarraga et al.	Feb 2012	A1
20120195417	Hua et al.	Aug 2012	A1
20120239928	Judell	Sep 2012	A1
20120290698	Alroy et al.	Nov 2012	A1
20130041969	Falco	Feb 2013	A1
20130046904	Hilland	Feb 2013	A1
20130138836	Cohen et al.	May 2013	A1
20130159865	Smith et al.	Jun 2013	A1
20130179870	Kelso	Jul 2013	A1
20130191622	Sasaki	Jul 2013	A1
20140095853	Sarangshar	Apr 2014	A1
20140095876	Smith	Apr 2014	A1

Foreign Referenced Citations (16)

Number	Date	Country
1567827	Jan 2005	CN
1728664	Feb 2006	CN
101123498	Feb 2008	CN
2 051 436	Apr 2009	EP
2160068	Mar 2010	EP
2002247089	Aug 2002	JP
2004166263	Jun 2004	JP
2005522774	Jul 2005	JP
2006157285	Jun 2006	JP
2007501563	Jan 2007	JP
200854214	Mar 2008	JP
2009510953	Mar 2009	JP
0190838	Nov 2001	WO
2006016698	Feb 2006	WO
2008099479	Aug 2008	WO
2012037518	Mar 2012	WO

Non-Patent Literature Citations (24)

Entry
Tom Shanley, Infiniband Network Architecture (excerpt), chapter—Detailed Description of the Link Layer, Pearson Education, published 2002, p. 390-392, 485, 491-493, 537-539.
Shanley, Infiniband Network Architecture, Pearson Education, published Oct. 2002 p. 387-394.
Shanley, Infiniband Network Architecture, Pearson Education, published Oct. 2002 p. 8-9, 391-396, 549-551.
International Search Report dated Sep. 26, 2013 for Application No. PCT/US2013/040656, 10 pages.
International Search Report dated Sep. 23, 2013 for Application No. PCT/US2013/040639, 10 pages.
InfiniBandSM Trade Association, InfiniBand™ Architecture Specification, vol. 1, Release 1.2.1, Nov. 2007, pp. 1-1727.
European Patent Office, International Searching Authority, International Search Report and Written Opinion dated Sep. 12, 2012 for Application No. PCT/US2012/040775, 13 pages.
Lee, M., Security Enhancement in Infiniband Architecture, Apr. 2005, IEEE, vol. 19, pp. 1-18.
Aurelio Bermudez, On the InfiniBand Subnet Discovery Process, IEEE The Computer Society 2003, pp. 1-6.
Tom Shanley, Infiniband Network Architecture, Pearson Education 2002, p. 559, 561.
Shanley, Tom, Infiniband Network Architecture (excerpt), Pearson Education, published 2002, p. 209-211, 393-394, 551, 554.
Kashyap, RFC 4392: IP over InfiniBand Architecture, Apr. 2006, pp. 1-22.
Shanley, et al, Infiniband Network Architecture, Oct. 2002, pp. 83-87, 95-102, 205-208, 403-406, Pearson Education.
State Intellectual Property Office of the People's Republic of China, Search Report for Chinese Patent Application No. 201180039850.7, May 5, 2015, 2 pages.
Shanley, InfiniBand Network Architecture (exerpt), 2002, pp. 204-209, 560-564, Pearson Education.
Shanley, Infiniband Network Architecture, (excerpt), 2002, p. 213, Pearson Education.
State Intellectual Property Office of the People's Republic of China, Search Report for Chinese Patent Application No. 201280027279.1, Office Action dated Sep. 9, 2015, 2 pages.
Shanley, Tom, “Infiniband Network Architecture” (Excerpt), Copyright 2002 by Mindshare, Inc., p. 86-87.
United States Patent and Trademark Office, Office Action dated Apr. 18, 2017 for U.S. Appl. No. 13/235,113, 30 pages.
European Patent Office, Communication Pursuant to Article 94(3) EPC, dated Mar. 8, 2017 for European Patent Application No. 11767106.5, 10 pages.
United States Patent and Trademark Office, Notice of Allowance and Fee(s) dated Oct. 26, 2017 for U.S. Appl. No. 13/235,130, 10 pages.
Ching-Min Lin et al., “A New Quorum-Based Scheme for Managing Replicated Data in Distributed Systems” IEEE Transactions on Computers, vol. 51, No. 12, Dec. 2002, 6 pages.
United States Patent and Trademark Office, Office Action dated Nov. 16, 2017 for U.S. Appl. No. 13/235,113, 28 pages.
United States Patent and Trademark Office, Office Communication dated Nov. 30, 2017 for U.S. Appl. No. 13/235,130, 3 pages.

Related Publications (1)

	Number	Date	Country
	20120311143 A1	Dec 2012	US

Provisional Applications (1)

	Number	Date	Country
	61493330	Jun 2011	US

System and method for supporting automatic disabling of degraded links in an infiniband (IB) network

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications

Abstract