Embodiments described herein relate generally to packet communication networks, and particularly to methods and systems for multipath management in such networks.
Various packet networks allow sending packets from a source to a destination over multiple paths. For example, the Equal-cost multi-path (ECMP) routing scheme, implemented, e.g., in network switches or routers, allows forwarding different flows over multiple best paths with equal routing priority.
An embodiment that is described herein provides a network adapter, including a port and one or more circuits. The port is to communicate packets over a network in which switches forward the packets in accordance with tuples of the packets. The one or more circuits are to hold a user-programmable scheme specifying one or more assignments of the packets of a given flow, which is destined to a peer node coupled to the network, to multiple sub-flows having respective different tuples, assign first packets of the given flow to one or more of the sub-flows of the given flow in accordance with the user-programmable scheme, by setting one or more respective tuples of the first packets, transmit the first packets to the peer node via the port. The one or more circuits are further to monitor one or more notifications received from the network, the notifications being indicative of respective states of the sub-flows, and based on the received notifications and on the user-programmable scheme, determine an assignment of second packets of the given flow to the sub-flows, and transmit the second packets to the peer node via the port.
In some embodiments, by setting different tuples to the different sub-flows, the one or more circuits in the network adapter are to cause the switches in the network to forward at least two of the sub-flows over different paths in the network. In other embodiments, the one or more circuits are to assign tuples having different source port numbers to different respective sub-flows of the given flow. In yet other embodiments, the one or more circuits are to provide an Application Programming Interface (API) for a user to specify the user-programmable scheme.
In an embodiment, the one or more circuits are to assign the first packets to a first sub-flow, and to assign the second packets to a second sub-flow, different from the first sub-flow. In another embodiment, the one or more circuits are to assign the second packets to the second sub-flow in response to the notifications being indicative of congestion or link failure on a path in the network traversed by the first sub-flow. In yet another embodiment, the one or more circuits are to determine the assignment of the second packets in response to receiving from the network notifications indicative of underutilization in sending the first sub-flow to the peer node.
In some embodiments, the one or more circuits are to distribute third packets of the given flow among multiple sub-flows in accordance with a distribution assignment. In other embodiments, the one or more circuits are to distribute the third packets among the multiple sub-flows, when the peer node supports receiving the third packets in an order different from a transmission order by the network adapter. In yet other embodiments, the one or more circuits are to, in response to receiving one or more notifications indicative that a network performance criterion is violated while distributing the third packets, reassign the third packets to a selected sub-flow among the sub-flows of the given flow.
In an embodiment, the one or more circuits are to distribute the first packets among the sub-flows in accordance with a first distribution scheme, and to distribute the second packets among the sub-flows in accordance with a second distribution scheme, different from the first distribution scheme. In another embodiment, the switches in the network support an adaptive routing scheme in which a switch adaptively selects a path to the peer node from among multiple paths, and the one or more circuits are to mark the packets of the given flow with an indication signaling to the switches whether to locally select paths for the packets of the given flow based on the tuples of the packets or using the adaptive routing scheme. In yet another embodiment, the one or more circuits are to mark the packets of the given flow with the indication for selecting paths using the adaptive routing scheme, and in response to receiving notifications indicative of a variation among Round-Trip Time (RTT) measurements corresponding to different paths exceeding a threshold variation, mark subsequent packets of the given flow with the indication for selecting paths based on the tuples.
In some embodiments, the one or more circuits are to mark the packets with the indication for selecting paths using the tuples up to a predefined hop, and for selecting paths using the adaptive routing scheme for one or more hops following the predefined hop.
There is additionally provided, in accordance with an embodiment that is described herein, a method for communication, including, in a network adapter including a port, communicating via the port packets over a network in which switches forward the packets in accordance with tuples of the packets. A user-programmable scheme is held, specifying one or more assignments of the packets of a given flow, which is destined to a peer node coupled to the network, to multiple sub-flows having respective different tuples. First packets of the given flow are assigned to one or more of the sub-flows of the given flow in accordance with the user-programmable scheme, by setting one or more respective tuples of the first packets. The first packets are transmitted to the peer node via the port. One or more notifications received from the network are monitored, the notifications being indicative of respective states of the sub-flows. Based on the received notifications and on the user-programmable scheme, an assignment of second packets of the given flow to the sub-flows is determined, and the second packets are transmitted to the peer node via the port.
These and other embodiments will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
Embodiments that are described herein provide methods and systems for managing the traversal of packets belonging to a common flow to a destination over multiple paths in the network. The multipath management is carried out by a sender-side network adapter from which the flow in question originates.
A communication network typically comprises multiple interconnected network devices such as switches or routers, wherein each network device forwards incoming packets to their destinations using a suitable forwarding scheme.
In principle, packet routing within switches could be based on flow identifiers. For example, the switch forwards packets of different flows via different paths, e.g., for balancing traffic load across the network. In such a routing scheme, all packets belonging to a given flow traverse the same path to the destination across the network. A forwarding scheme of this sort may be suitable for packets that require strict packet ordering but may result in poor network utilization in certain cases. Moreover, conventional hash-based routing at the flow level is typically incapable of mitigating network events such congestion and/or a link failure.
In the disclosed embodiments, a sender-side Network Interface Controller (NIC) manages per-flow multipath delivery of packets that require no strict ordering at the destination. To this end, the sender-side NIC associates a flow with multiple sub-flows dedicated to that flow, and the network switches are preconfigured to forward the sub-flows to different respective egress ports (resulting in different respective paths). The sub-flows of a given flow are associated with respective sub-flow identifiers (e.g., tuples), so that packets of the given flow having different sub-flow identifiers may be routed across the network via different paths.
Consider an embodiment of a network adapter comprising a port and one or more circuits. The port is to communicate packets over a network in which switches forward the packets in accordance with tuples of the packets. The one or more circuits are to hold a user-programmable scheme specifying one or more assignments of the packets of a given flow, which is destined to a peer node coupled to the network, to multiple sub-flows having respective different tuples, assign first packets of the given flow to one or more of the sub-flows of the given flow in accordance with the user-programmable scheme, by setting one or more respective tuples of the first packets, transmit the first packets to the peer node via the port, monitor one or more notifications received from the network, the notifications being indicative of respective states of the sub-flows, based on the received notifications and on the user-programmable scheme, determine an assignment of second packets of the given flow to the sub-flows, and transmit the second packets to the peer node via the port.
By setting different tuples to the different sub-flows, the network adapter causes the switches in the network to forward at least two of the sub-flows over different paths in the network.
The network adapter may determine tuples for packets assigned to different respective sub-flows in various ways. In an example embodiment, the one or more circuits assign tuples having different source port numbers to different respective sub-flows of the given flow. For example, the four Least Significant Bits (LSBs) of the source port number define sixteen sub-flows of the given flow.
In some embodiments, the one or more circuits provide one or more Application Programming Interfaces (APIs) for a user to specify the user-programmable scheme. Using the APIs, a user may provision assignments of packets to sub-flows to meets his own requirements.
The network adapter may assign packets to sub-flows in various ways. For example, with a “static assignment” a single sub-flow is assigned to a sequence of packets, causing them to traverse a common path to the destination. A static assignment is applicable, for example, for packets that require strict packet order. With a “distribution assignment” packets in a sequence are assigned at least two different sub-flows, causing these packets to traverse at least two different paths to the destination. With a distribution assignment, the network adapter may select sub-flows for a sequence of packets in any suitable order, e.g., using a predefined order or randomly. A distribution assignment may be applicable, for example, for balancing load in the network.
In some embodiments, transition among different assignments is also supported. For example, a transition from a first sub-flow to another sub-flow may be applied upon receiving from the network notifications indicative of congestion or link failure on a path in the network traversed by the first sub-flow.
Using a static assignment for a single sub-flow may result in underutilization of network resources. In the present context the term “utilization” refers to the percentage of the total bandwidth of the network being used. For example, if the network supports a data rate of 100 Gb/s but the actual data rate is 70 Gb/s, the network performs at 70% utilization. The term “underutilization” means that the network performs below a specified utilization level.
Consider for example a first path whose bandwidth is split equally between flows denoted ‘a’ and ‘b’, and a second path traversing another flow denoted ‘c’ occupying only 50% of the second path bandwidth. In this example the network is underutilized. If flow ‘a’ is distributed equally between the first and second paths, the first path is split equally between flows ‘a’ and ‘b’, and the second path is split equally between flows ‘a’ and ‘c’, the network is fully utilized.
In some embodiments, the one or more circuits transition from a static assignment of a single sub-flow to a distribution assignment, in response to receiving from the network notifications indicative of underutilization in sending the single sub-flow to the peer node.
In some embodiments, the one or more circuits apply a distribution assignment to third packets of the given flow, when the peer node supports receiving the third packets in an order different from a transmission order by the network adapter. While distributing the third packets among the sub-flows, and in response to receiving from the network notifications indicative of a performance criterion being violated (due to the distribution), the one or more circuits reassign the third packets to a selected sub-flow among the sub-flows of the given flow.
In some embodiments, the one or more circuits are to distribute the first packets among the sub-flows in accordance with a first distribution scheme, and to distribute the second packets among the sub-flows in accordance with a second distribution scheme, different from the first distribution scheme.
In some embodiments, the switches in the network support an adaptive routing scheme in which a switch adaptively selects a path to the peer node from among multiple paths. In such embodiments, the one or more circuits are to mark the packets of the given flow with an indication signaling to the switches whether to locally select paths for the packets of the given flow based on the tuples of the packets or using the adaptive routing scheme.
In some embodiments, the one or more circuits are to mark the packets of the given flow with the indication for selecting paths using the adaptive routing scheme, and in response to receiving notifications indicative of a variation among Round-Trip Time (RTT) measurements corresponding to different paths exceeding a threshold variation, mark subsequent packets of the given flow with the indication for selecting paths based on the tuples.
Adaptive routing may be applied in only a subset of the switches. For example, the one or more circuits mark the packets with the indication for selecting paths using the tuples up to a predefined hop, and for selecting paths using the adaptive routing scheme for one or more hops following the predefined hop.
In the disclosed techniques, a sender-side network adapter assigns packets of a given flow to multiple sub-flows, each of which traversing a different path to the destination via the network. The network adapter monitors performance in delivering packets of currently assigned (and possibly other) sub-flows of the given flow and can reassign subsequent packets to the sub-flows differently, to optimize performance. The disclosed embodiments improve fabric utilization, for example, when the network traffic includes a small number of high-bandwidth flows, which configuration is prone to result in underutilization. Moreover, the network adapter performs path transitions much faster than the underlying transport layer protocol can.
Computer system 20 comprises network nodes 24 communicating with one another over a communication network 28. In the present example, a source node 24A sends packets belonging to a common flow 26 to a destination node 24B over communication network 28.
Computer system 20 may be used, for example, in high-rate communication applications such as, for example, in High-Performance Computing (HPC) environments, data centers, storage networks, Artificial Intelligence (AI) clusters, and in providing cloud services.
Communication network 28 may comprise any suitable type of a communication network, operating using any suitable communication protocols. For example, communication network 28 may comprise an Ethernet network in which packets are communicated using the Transmission Control Protocol (TCP) and the Internet Protocol (IP). As another example, communication network 28 may comprise an InfiniBand™ fabric. Communication over network 28 may be based, for example, on the Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) protocol, which is a RDMA protocol (implementing an InfiniBand transport layer) over IP and Ethernet.
Communication network 28 comprises multiple network devices 40, interconnected by links 42 in any suitable topology. In the present example network devices 40 comprise network switches, also referred to simply as “switches” for brevity. Alternatively or additionally, communication network 28 may comprise other suitable network devices such as routers.
In some embodiment, a Subnet Manager (SM) 50 is coupled to communication network 28. Among other tasks SM 50 configures switches 40, collects information indicative of network performance, network events causing degraded performance, and the like. Such information may be used by a user for optimizing network performance.
Network nodes 24 (including source node 24A and destination node 24B) are coupled to communication network 28 using a suitable network adapter or Network Interface Controller (NIC) 44. In
In the example of
In general, a source node (e.g., 24A) may send over communication network 28 one or more flows, and a destination node (e.g., 24B) may receive from the communication network one or more flows.
In some embodiments, switches 40 forward packets using a hash-based forwarding scheme that applies a sub-flow level routing scheme, in contrast to conventional flow level hash-based routing schemes such as the Equal-cost multi-path (ECMP) scheme.
In some embodiments, the sub-flow level hash-based routing scheme employed by switches 40 specifies for a given flow a group of multiple egress ports corresponding to respective sub-flows. In such embodiments, packets of a common flow may be assigned to different sub-flows of that flow by setting tuples of the packets. Since different tuples of the sub-flows produce different hash results with high probability, packets assigned to different sub-flows traverse different paths to the destination through the communication network. The tuples may comprise, for example, five-tuples that differ from one another, e.g., in the User Datagram Protocol (UDP) source port number and share the same source and destination addresses and destination port. In alternative embodiments other methods for setting the tuples to assign packets to sub-flows can also be used.
In the example of
As will be described below, NIC 44A may assign to each packet of a given flow one of the sub-flows of that flow, to optimize performance e.g., based on states of the sub-flows reported by elements of the network.
As noted above, NIC 44A is coupled to communication network 28 on one side, and to host 32A of source node 24A on the other side. NIC 44A thus mediates between the host and the communication network.
In the receive direction, NIC 44A receives packets from communication network 28 via an ingress port 104. A receive (Rx) pipeline 108 processes the received packets and sends the processed packets to host 32A. In the transmit direction, NIC 44A receives from host 32A packets for transmission. A transmit (Tx) pipeline 116 processes the packets and transmits them to communication network 28 via an egress port 112.
NIC 44A comprises a multipath controller 120 (also referred to simply as “controller” for brevity) that controls the traversal of packets belonging to a common flow, via multiple paths in communication network 28, to the same destination node (e.g., 24B). To this end, the controller assigns packets of the common flow to multiple sub-flows of that flow, wherein the sub-flows are associated with different respective tuples carried in the packets of the flow. Moreover, switches 40 in communication network 28 are configured to forward the different sub-flows via different paths to the destination, based on the tuples. The tuples thus serve as sub-flow identifiers. In some embodiments, NIC 44A applies a suitable hash function to the tuple to generate a corresponding sub-flow identifier.
In some embodiments, controller 120 comprises a memory 122 holding for a given flow (e.g., flow 26) a user-programmable scheme 124 and a flow context 128. The user-programmable scheme 124 specifies one or more assignments of packets belonging to the given flow to multiple respective sub-flows having respective different tuples. Upon receiving in Tx pipeline 116 a packet from the host, controller 120 selects a sub-flow (or a tuple) for the packet, based at least on information extracted from the packet's header, and on one of the assignments in user-programmable scheme 124.
In some embodiments, controller 120 selects a sub-flow for the packet also based on notifications received from the network via Rx pipeline 108, the notifications being indicative of respective states of the sub-flows of the given flow. Controller 120 thus monitors the network performance per sub-flow and can reassign subsequent packets of the given flow to the sub-flows so as to improve the performance.
In some embodiments, controller 120 reassigns the packets to the sub-flows, e.g., transitions from the currently used assignment to a different assignment, in response to detecting that a performance criterion has been violated. In one such embodiment, the controller detects a performance criterion violation when the path traversed by the current sub-flow is congested or contains a failing link. In another embodiment, the controller detects a performance criterion violation when the delay along a path traversed by the current sub-flow exceeds a delay threshold.
Various types of notifications can be used for monitoring the network performance, e.g., end-to-end notifications at the sub-flow level. For example, with the RDMA over Converged Ethernet (RoCE) protocol, the destination node may send back to the source node a Negative Acknowledgement Notification (NACK) or a Congestion Notification Packet (CNP). A NACK may be indicative of missing packets in a sub-flow. A CNP is indicative of congestion or a link failure occurring along the path traversed by a sub-flow. As another example, in an embodiment, the destination node sends telemetry information to the source node such as Round-Trip Time (RTT) measurements.
In some embodiments, NIC 44A monitors the performance of multiple sub-flows of the same flow. The NIC may monitor sub-flows that are not used by the current assignment, by sending RTT probe packets on these sub-flows. In such embodiments controller 120 may transition from the current sub-flow to another sub-flow when the RTT measured for other sub-flow is lower than the RTT measured for the current sub-flow.
Another type of a notification that is indicative of a link failure can be reported by network switches 40 to the source node in response to detecting a link failure by identifying a corresponding timeout expiration.
Tx pipeline 116 comprises a packet modifier 132 that based on the sub-flow selected by controller 120, sets a corresponding tuple value in the header of the packet. The TX pipeline transmits the packet output by the packet modifier to communication network 28 via egress port 112.
In some embodiments, controller 120 manages stateful assignments of packets to sub-flows of a given flow using flow context 128. The flow context may store any suitable information required for managing per-flow multipath, such as one or more sub-flow identifiers of the sub-flows assigned to recently transmitted packets of the given flow, the assignment in user-programmable scheme 124 currently selected for the given flow, network states of one or more recently assigned sub-flows, and the like. Controller 120 uses the information in flow context 128 in selecting subsequent assignments and sub-flows.
In some embodiments, controller 120 comprises one or more user Application Programming Interfaces (APIs) 136. User APIs 136 allow flexible provisioning of assignments to user-programmable scheme 124 for meeting users' own requirements.
In some embodiments, based on the information collected from the network by SM 50, a user may define suitable assignments of packets to sub-flows of a given flow, and provision these assignments to user-programmable scheme 124 via user APIs 136.
In describing
In some embodiments, the sub-flows of a distribution assignment are selected with the same priority. In other embodiments, the NIC selects different sub-flows of the distribution assignment with different respective priorities, e.g., by associating the sub-flows with respective weights. For example, the weights may be assigned depending on respective path loads, e.g., a loaded path gets lower priority than a less loaded path.
Some of the assignments in user-programmable scheme 124 specify rules for transitioning among assignments, e.g., depending on notifications received from the communication network. User-programmable scheme 124 comprises rules specifying transitions among assignments, e.g., a rule specifying conditions for starting and stopping the application of a distribution assignment, a rule specifying conditions for transitioning from a static assignment to another static assignment, and the like.
The assignment rule in
In some embodiments, NIC 44A searches for a suitable sub-flow by attempting several static assignments. The search process may be triggered in response to the NIC detecting that the delay of the path traversed by the current sub-flow exceeds a threshold delay. The search process continues, for example, until finding a sub-flow for which the delay is lower than a threshold delay.
The example assignments and rules depicted in
The method will be described as executed by NIC 44a of
The method begins at a provisioning step 200, with NIC 44A being provisioned with one or more assignments of packets to sub-flows into user-programmable scheme 124 via user API(s) 136. For example, the assignments provisioned for a given flow may comprise static assignments, distribution assignments and rules for transitioning among the static and distribution assignments, as described above. Further at step 200, controller 120 selects from among the provisioned assignments, an initial assignment, e.g., a default assignment predefined for the given flow, and starts applying the initial assignment. In an embodiment, the controller stores an identifier of the assignment currently used for the given flow in flow context 128.
At a reception from host step 204, Tx pipeline 116 of NIC 44A receives from host 32A a packet belonging to the given flow, for transmission to a peer node, e.g., destination node 24B. At a sub-flow selection step 208, the controller determines a sub-flow for the packet based on the assignment currently used the given flow, and possibly on notifications received from the network via Rx pipeline 108.
At a transmission step 212, packet modifier 132 sets the tuple of the packet to the tuple value associated with the sub-flow determined at step 208, and Tx pipeline 116 transmits the packet to the communication network via egress port 112.
At a monitoring step 216, controller 120 monitors notifications received from the communication network via the Rx pipeline, the notifications being indicative of states of one or more sub-flows of the given flow corresponding to the current assignment.
At a performance query step 220, the controller checks whether the performance level in transmitting the given flow to the peer node using the current assignment is acceptable, e.g., based on the states of the sub-flow(s) reported in the notifications. For example, the controller checks whether any predefined performance criterion is violated, as described above. When at step 220 the performance level is unacceptable (e.g., at least one performance criterion has been violated), the controller proceeds to an assignment transitioning step 224, at which the controller selects another assignment, different from the current assignment, and applies the other assignment to subsequent packets of the given flow.
Following step 220 when the performance level is acceptable, and following step 224, controller 120 loops back to step 204 to receive another packet of the given flow from the host.
Handling Packets Requiring Ordered Reception
In some embodiments, flow 26 contains multi-packet messages. When packets of a common message are assigned to the same sub-flow (e.g., in accordance with a static assignment), these packets arrive at destination node 24B in the same order in which they were transmitted by source node 24A. For example, even when communication network 28 comprises a lossless fabric (in which case no packets are dropped by the switches), some RDMA operations such as “RDMA send” are required to retain order among the packets. NIC 44a does not assign packets of such operations to sub-flows using a distribution assignment, to prevent the arrival of the packets at the destination in an order different from the order in which they were transmitted by the source node.
As noted above, in some embodiments, sender-side NIC 44A, refrains from applying distribution assignments to packets for which the remote node does not support out of order reception. Usage of distribution assignments may be enabled or disabled in the NIC, e.g., via user API 136, by marking packets as supporting or not supporting out of order reception, or both. For example, NIC 44a may disable usage of a distribution assignment for packets supporting out of order reception, e.g., when such distribution assignment results in poor performance.
The method begins at an input step 250, with controller 120 receiving from Tx pipeline 116, a header of a packet (or part thereof) belonging to a given flow. At an out or order query step 254, the controller checks whether the received packet is marked as supporting out of order reception, and if so, the controller proceeds to a distribution query step 258, at which the controller checks whether the distribution assignment is enabled for the given flow or not.
When at step 254 the packet does not support out of order reception or when at step 258 the distribution assignment is disabled, ordered packet delivery is maintained, by the controller assigning to the packet a sub-flow based on a static assignment, at a static assignment application step 262. When at step 254 the packet supports out of order reception, and in addition, at step 258 the distribution assignment is enabled for the packet, the controller determines a sub-flow for the packet based on the distribution assignment, at a distribution assignment application step 266.
Following each of steps 262 and 266, controller 120 outputs the sub-flow determined for the packet, at an output step 270, and the method terminates.
In some embodiments, in addition to a hash-based routing scheme, network switches 40 support adaptive determination of the optimal path a packet should follow to its destination through the network. Such a routing scheme is also referred to as an Adaptive Routing (AR) scheme. For example, some network switch products by the Nvidia corporation support the Nvidia's AR scheme.
NIC 44A may mark the packets of s given flow with an indication signaling to the switches whether to locally select paths for the packets of the given flow based on the tuples of the packets (sub-flow hash-based forwarding) or using the adaptive routing scheme.
In some embodiments, NIC 44a enables or disables AR per flow. For example, The NIC holds for each flow a respective AR enable/disable state in flow context 128, and marks packets of the flow based on the AR state as will be described below.
In applying the AR scheme for a packet of a given flow, switch 40 selects for the packet the least loaded output port (from a set of outgoing ports leading to the destination of the given flow) based on egress port queue depth and path priority (e.g., the shortest path has the highest priority).
In some embodiments, only a partial subset of the entire switches of the communication network support or enable using the AR scheme. For example, the NIC marks the packets with the indication for selecting paths using the tuples up to a predefined hop, and for selecting paths using the adaptive routing scheme for one or more hops following the predefined hop.
Forwarding in the switches using the AR scheme typically performs well in various scenarios, but may result in poor performance in others, e.g., when the underlying topology is highly complex. For example, applying AR in the switches for a given flow may cause many out of order reception events, which in turn degrade performance in delivering packets of the given flow to the destination. It is therefore sometimes advantageous to disable AR to packets of flows suffering performance degradation due to the AR.
In some embodiments, since with AR packets sent over different paths may arrive at the destination in an order different from the order in which they were transmitted by the source node, AR is applied in switch 40 only for packets for which the destination allows out of order reception. When switch 40 applies AR the switch ignores tuple information that the NIC may have set in the packets in assigning the packets to sub-flows. For example, a packet may be assigned by the NIC to some sub-flow associated with a corresponding first path, but the switch applying AR may decide to forward this packet to a different second path having an available bandwidth larger than that of the first path.
The method of
The method begins with NIC 44A enabling AR for a given flow, at an AR configuration step 300. Consequently, packets of the given flow for which the destination supports out of order reception, will be forwarded by switches 40 using the AR scheme and not using the sub-flow hash-based routing scheme.
At a reception step 304, NIC 44A receives from host 32A a packet of the given flow for transmission. At a query step 308, NIC 44A checks whether out of order reception is allowed for the packet, and if so, marks the packet as supporting AR, at an AR support marking step 312, and transmits the packet to communication network 28. In some embodiments, the packets have separate fields in the header for marking a packet as supporting or not supporting out of order reception and for marking the packet as supporting or not supporting AR. In other embodiments, a common field in the header serves for both types of markings.
At a performance evaluation step 316, the NIC evaluates the performance in delivering the given flow to its destination, e.g., based on notifications received from the communication network. For example, the received notifications are indicative of Round-Trip Time (RTT) measurements corresponding to multiple respective paths selected by the AR scheme.
At an AR disabling step 320, NIC 44A detects that the performance level for the given flow due to the AR scheme is unacceptable, e.g., by detecting that the variation among the RTT measurements corresponding to the different paths exceeds a threshold variation, the NIC disables AR for the given flow in the NIC, and determines for the given flow a static assignment.
When at step 308 above the destination does not allow out of order reception for the packet, the NIC marks the packet as non-supporting AR, at a non-AR marking step 324, and at a static assignment step 328, assigns the packet to a sub-flow in accordance with a static assignment.
Following each of steps 320 and 328 the method loops back to step 304 to receive another packet of the given flow from the host.
In the example method of
The computer system configuration of
Some elements of NIC 44A, such as Rx pipeline 108, TX pipeline 116 and multipath controller 120 may be implemented in hardware, e.g., in one or more Application-Specific Integrated Circuits (ASICs) or Field-Programmable Gate Arrays (FPGAs). Additionally or alternatively, some elements of the NIC can be implemented using software, or using a combination of hardware and software elements.
Elements that are not necessary for understanding the principles of the present application, such as various interfaces, addressing circuits, timing and sequencing circuits and debugging circuits, have been omitted from
Memory 122 may comprise any suitable storage device using any suitable storage technology, such as, for example, a Random Access Memory (RAM) or a nonvolatile memory such as a Flash memory.
In some embodiments, some of the functions of multipath controller 120 may be carried out by a general-purpose processor, which is programmed in software to carry out the functions described herein. The software may be downloaded to the processor in electronic form, over a network, for example, or it may, alternatively or additionally, be provided and/or stored on non-transitory tangible media, such as magnetic, optical, or electronic memory.
The various elements of NIC 44A such as Rx pipeline 108, Tx pipeline 116, and controller 120 are collectively referred to in the claims as “one or more circuits”.
The embodiments described above are given by way of example, and other suitable embodiments can also be used.
It will be appreciated that the embodiments described above are cited by way of example, and that the following claims are not limited to what has been particularly shown and described hereinabove. Rather, the scope includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art. Documents incorporated by reference in the present patent application are to be considered an integral part of the application except that to the extent any terms are defined in these incorporated documents in a manner that conflicts with the definitions made explicitly or implicitly in the present specification, only the definitions in the present specification should be considered.
Number | Name | Date | Kind |
---|---|---|---|
4312064 | Bench et al. | Jan 1982 | A |
6115385 | Vig | Sep 2000 | A |
6169741 | Lemaire et al. | Jan 2001 | B1 |
6480500 | Erimli et al. | Nov 2002 | B1 |
6532211 | Rathonyi et al. | Mar 2003 | B1 |
6553028 | Tang et al. | Apr 2003 | B1 |
6614758 | Wong | Sep 2003 | B2 |
6665297 | Harigochi et al. | Dec 2003 | B1 |
6775268 | Wang et al. | Aug 2004 | B1 |
6795886 | Nguyen | Sep 2004 | B1 |
6804532 | Moon et al. | Oct 2004 | B1 |
6807175 | Jennings et al. | Oct 2004 | B1 |
6831918 | Kavak | Dec 2004 | B1 |
6912589 | Jain et al. | Jun 2005 | B1 |
6912604 | Tzeng et al. | Jun 2005 | B1 |
6950428 | Horst et al. | Sep 2005 | B1 |
7010607 | Bunton | Mar 2006 | B1 |
7076569 | Bailey et al. | Jul 2006 | B1 |
7221676 | Green et al. | May 2007 | B2 |
7234001 | Simpson et al. | Jun 2007 | B2 |
7274869 | Pan et al. | Sep 2007 | B1 |
7286535 | Ishikawa et al. | Oct 2007 | B2 |
7401157 | Costantino et al. | Jul 2008 | B2 |
7590110 | Beshai et al. | Sep 2009 | B2 |
7676597 | Kagan et al. | Mar 2010 | B2 |
7746854 | Ambe et al. | Jun 2010 | B2 |
7899930 | Turner et al. | Mar 2011 | B1 |
7924837 | Shabtay et al. | Apr 2011 | B1 |
7936770 | Frattura et al. | May 2011 | B1 |
7969980 | Florit et al. | Jun 2011 | B1 |
8094569 | Gunukula et al. | Jan 2012 | B2 |
8175094 | Bauchot et al. | May 2012 | B2 |
8195989 | Lu et al. | Jun 2012 | B1 |
8213315 | Crupnicoff et al. | Jul 2012 | B2 |
8401012 | Underwood et al. | Mar 2013 | B2 |
8489718 | Brar et al. | Jul 2013 | B1 |
8495194 | Brar et al. | Jul 2013 | B1 |
8570865 | Goldenberg et al. | Oct 2013 | B2 |
8576715 | Bloch et al. | Nov 2013 | B2 |
8605575 | Gunukula et al. | Dec 2013 | B2 |
8621111 | Marr et al. | Dec 2013 | B2 |
8625427 | Terry et al. | Jan 2014 | B1 |
8681641 | Sajassi et al. | Mar 2014 | B1 |
8737269 | Zhou et al. | May 2014 | B1 |
8755389 | Poutievski et al. | Jun 2014 | B1 |
8774063 | Beecroft | Jul 2014 | B2 |
8867356 | Bloch et al. | Oct 2014 | B2 |
8873567 | Mandal et al. | Oct 2014 | B1 |
8908510 | Sela et al. | Dec 2014 | B2 |
8908704 | Koren et al. | Dec 2014 | B2 |
9014006 | Haramaty et al. | Apr 2015 | B2 |
9042234 | Liljenstolpe et al. | May 2015 | B1 |
9137143 | Parker et al. | Sep 2015 | B2 |
9231888 | Bogdanski et al. | Jan 2016 | B2 |
9264382 | Bogdanski et al. | Feb 2016 | B2 |
9385949 | Vershkov et al. | Jul 2016 | B2 |
9544185 | Yadav et al. | Jan 2017 | B1 |
9548960 | Haramaty et al. | Jan 2017 | B2 |
9571400 | Mandal et al. | Feb 2017 | B1 |
9584429 | Haramaty et al. | Feb 2017 | B2 |
9699095 | Elias et al. | Jul 2017 | B2 |
9729473 | Haramaty et al. | Aug 2017 | B2 |
9876727 | Gaist et al. | Jan 2018 | B2 |
9985910 | Gafni et al. | May 2018 | B2 |
10009277 | Goldenberg et al. | Jun 2018 | B2 |
10079782 | Haramaty et al. | Sep 2018 | B2 |
10200294 | Shpiner et al. | Feb 2019 | B2 |
10205683 | Elias et al. | Feb 2019 | B2 |
10218642 | Mula et al. | Feb 2019 | B2 |
10230652 | Haramaty et al. | Mar 2019 | B2 |
10389646 | Zdornov et al. | Aug 2019 | B2 |
10554556 | Haramaty et al. | Feb 2020 | B2 |
10574546 | Levi et al. | Feb 2020 | B2 |
10644995 | Levy et al. | May 2020 | B2 |
11005724 | Shpigelman et al. | May 2021 | B1 |
11310163 | Lo et al. | Apr 2022 | B1 |
11411911 | Levi et al. | Aug 2022 | B2 |
20010043564 | Bloch et al. | Nov 2001 | A1 |
20010043614 | Viswanadhham et al. | Nov 2001 | A1 |
20020009073 | Furukawa et al. | Jan 2002 | A1 |
20020013844 | Garrett et al. | Jan 2002 | A1 |
20020026525 | Armitage | Feb 2002 | A1 |
20020039357 | Lipasti et al. | Apr 2002 | A1 |
20020067693 | Kodialam et al. | Jun 2002 | A1 |
20020071439 | Reeves et al. | Jun 2002 | A1 |
20020085586 | Tzeng | Jul 2002 | A1 |
20020136163 | Kawakami et al. | Sep 2002 | A1 |
20020138645 | Shinomiya et al. | Sep 2002 | A1 |
20020141412 | Wong | Oct 2002 | A1 |
20020165897 | Kagan et al. | Nov 2002 | A1 |
20020176363 | Durinovic-Johri et al. | Nov 2002 | A1 |
20030016624 | Bare | Jan 2003 | A1 |
20030039260 | Fujisawa | Feb 2003 | A1 |
20030065856 | Kagan et al. | Apr 2003 | A1 |
20030079005 | Myers et al. | Apr 2003 | A1 |
20030097438 | Bearden et al. | May 2003 | A1 |
20030223453 | Stoler et al. | Dec 2003 | A1 |
20040024903 | Costatino et al. | Feb 2004 | A1 |
20040062242 | Wadia et al. | Apr 2004 | A1 |
20040111651 | Mukherjee et al. | Jun 2004 | A1 |
20040202473 | Nakamura et al. | Oct 2004 | A1 |
20050013245 | Sreemanthula et al. | Jan 2005 | A1 |
20050154790 | Nagata et al. | Jul 2005 | A1 |
20050157641 | Roy | Jul 2005 | A1 |
20050259588 | Preguica | Nov 2005 | A1 |
20060126627 | Diouf | Jun 2006 | A1 |
20060143300 | See et al. | Jun 2006 | A1 |
20060182034 | Klinker et al. | Aug 2006 | A1 |
20060215645 | Kangyu | Sep 2006 | A1 |
20060291480 | Cho et al. | Dec 2006 | A1 |
20070030817 | Arunachalam et al. | Feb 2007 | A1 |
20070058536 | Vaananen et al. | Mar 2007 | A1 |
20070058646 | Hermoni | Mar 2007 | A1 |
20070070998 | Sethuram et al. | Mar 2007 | A1 |
20070091911 | Watanabe et al. | Apr 2007 | A1 |
20070104192 | Yoon et al. | May 2007 | A1 |
20070183418 | Riddoch et al. | Aug 2007 | A1 |
20070223470 | Stahl | Sep 2007 | A1 |
20070237083 | Oh et al. | Oct 2007 | A9 |
20080002690 | Ver Steeg et al. | Jan 2008 | A1 |
20080101378 | Krueger | May 2008 | A1 |
20080112413 | Pong | May 2008 | A1 |
20080165797 | Aceves | Jul 2008 | A1 |
20080186981 | Seto et al. | Aug 2008 | A1 |
20080189432 | Abali et al. | Aug 2008 | A1 |
20080267078 | Farinacci et al. | Oct 2008 | A1 |
20080298248 | Roeck et al. | Dec 2008 | A1 |
20090010159 | Brownell et al. | Jan 2009 | A1 |
20090022154 | Kiribe et al. | Jan 2009 | A1 |
20090097496 | Nakamura et al. | Apr 2009 | A1 |
20090103534 | Malledant et al. | Apr 2009 | A1 |
20090119565 | Park et al. | May 2009 | A1 |
20090262741 | Jungck et al. | Oct 2009 | A1 |
20100020796 | Park et al. | Jan 2010 | A1 |
20100039959 | Gilmartin | Feb 2010 | A1 |
20100049942 | Kim et al. | Feb 2010 | A1 |
20100111529 | Zeng et al. | May 2010 | A1 |
20100141428 | Mildenberger et al. | Jun 2010 | A1 |
20100189113 | Csaszar et al. | Jul 2010 | A1 |
20100216444 | Mariniello et al. | Aug 2010 | A1 |
20100284404 | Gopinath et al. | Nov 2010 | A1 |
20100290385 | Ankaiah et al. | Nov 2010 | A1 |
20100290458 | Assarpour et al. | Nov 2010 | A1 |
20100315958 | Luo et al. | Dec 2010 | A1 |
20110019673 | Fernandez | Jan 2011 | A1 |
20110080913 | Liu et al. | Apr 2011 | A1 |
20110085440 | Owens et al. | Apr 2011 | A1 |
20110085449 | Jeyachandran et al. | Apr 2011 | A1 |
20110090784 | Gan | Apr 2011 | A1 |
20110164496 | Loh et al. | Jul 2011 | A1 |
20110164518 | Daraiseh et al. | Jul 2011 | A1 |
20110225391 | Burroughs et al. | Sep 2011 | A1 |
20110249679 | Lin et al. | Oct 2011 | A1 |
20110255410 | Yamen et al. | Oct 2011 | A1 |
20110265006 | Morimura et al. | Oct 2011 | A1 |
20110299529 | Olsson et al. | Dec 2011 | A1 |
20120020207 | Corti et al. | Jan 2012 | A1 |
20120075999 | Ko et al. | Mar 2012 | A1 |
20120082057 | Welin et al. | Apr 2012 | A1 |
20120144065 | Parker et al. | Jun 2012 | A1 |
20120147752 | Ashwood-Smith et al. | Jun 2012 | A1 |
20120163797 | Wang | Jun 2012 | A1 |
20120170582 | Abts et al. | Jul 2012 | A1 |
20120207175 | Raman et al. | Aug 2012 | A1 |
20120250500 | Liu | Oct 2012 | A1 |
20120250679 | Judge et al. | Oct 2012 | A1 |
20120287791 | Xi et al. | Nov 2012 | A1 |
20120300669 | Zahavi | Nov 2012 | A1 |
20120314706 | Liss | Dec 2012 | A1 |
20130044636 | Koponen et al. | Feb 2013 | A1 |
20130071116 | Ong | Mar 2013 | A1 |
20130083701 | Tomic et al. | Apr 2013 | A1 |
20130114599 | Arad | May 2013 | A1 |
20130114619 | Wakumoto | May 2013 | A1 |
20130159548 | Vasseur et al. | Jun 2013 | A1 |
20130170451 | Krause et al. | Jul 2013 | A1 |
20130182604 | Moreno et al. | Jul 2013 | A1 |
20130204933 | Cardona et al. | Aug 2013 | A1 |
20130208720 | Ellis et al. | Aug 2013 | A1 |
20130242745 | Umezuki | Sep 2013 | A1 |
20130259033 | Hefty | Oct 2013 | A1 |
20130297757 | Han et al. | Nov 2013 | A1 |
20130315237 | Kagan et al. | Nov 2013 | A1 |
20130322256 | Bader et al. | Dec 2013 | A1 |
20130329727 | Rajagopalan et al. | Dec 2013 | A1 |
20130336116 | Vasseur et al. | Dec 2013 | A1 |
20130336164 | Yang et al. | Dec 2013 | A1 |
20140016457 | Enyedi et al. | Jan 2014 | A1 |
20140022942 | Han et al. | Jan 2014 | A1 |
20140043959 | Owens et al. | Feb 2014 | A1 |
20140059440 | Sasaki et al. | Feb 2014 | A1 |
20140105034 | Sun | Apr 2014 | A1 |
20140140341 | Bataineh et al. | May 2014 | A1 |
20140169173 | Naouri et al. | Jun 2014 | A1 |
20140192646 | Mir et al. | Jul 2014 | A1 |
20140198636 | Thayalan et al. | Jul 2014 | A1 |
20140211808 | Koren et al. | Jul 2014 | A1 |
20140269305 | Nguyen | Sep 2014 | A1 |
20140313880 | Lu et al. | Oct 2014 | A1 |
20140328180 | Kim et al. | Nov 2014 | A1 |
20140343967 | Baker | Nov 2014 | A1 |
20150030033 | Vasseur et al. | Jan 2015 | A1 |
20150052252 | Gilde et al. | Feb 2015 | A1 |
20150092539 | Sivabalan et al. | Apr 2015 | A1 |
20150124815 | Beliveau et al. | May 2015 | A1 |
20150127797 | Attar et al. | May 2015 | A1 |
20150131663 | Brar et al. | May 2015 | A1 |
20150163144 | Koponen et al. | Jun 2015 | A1 |
20150172070 | Csaszar | Jun 2015 | A1 |
20150194215 | Douglas et al. | Jul 2015 | A1 |
20150195204 | Haramaty et al. | Jul 2015 | A1 |
20150249590 | Gusat et al. | Sep 2015 | A1 |
20150295858 | Chrysos et al. | Oct 2015 | A1 |
20150372916 | Haramaty et al. | Dec 2015 | A1 |
20160012004 | Arimilli et al. | Jan 2016 | A1 |
20160014636 | Bahr et al. | Jan 2016 | A1 |
20160028613 | Haramaty et al. | Jan 2016 | A1 |
20160043933 | Gopalarathnam | Feb 2016 | A1 |
20160080120 | Unger et al. | Mar 2016 | A1 |
20160080321 | Pan et al. | Mar 2016 | A1 |
20160182378 | Basavaraja et al. | Jun 2016 | A1 |
20160294715 | Raindel et al. | Oct 2016 | A1 |
20160380893 | Chopra et al. | Dec 2016 | A1 |
20170054445 | Wang | Feb 2017 | A1 |
20170054591 | Hyoudou et al. | Feb 2017 | A1 |
20170068669 | Levy et al. | Mar 2017 | A1 |
20170070474 | Haramaty et al. | Mar 2017 | A1 |
20170163775 | Ravi | Jun 2017 | A1 |
20170180243 | Haramaty et al. | Jun 2017 | A1 |
20170187614 | Haramaty et al. | Jun 2017 | A1 |
20170195758 | Schrans et al. | Jul 2017 | A1 |
20170244630 | Levy et al. | Aug 2017 | A1 |
20170270119 | Kfir et al. | Sep 2017 | A1 |
20170286292 | Levy et al. | Oct 2017 | A1 |
20170331740 | Levy et al. | Nov 2017 | A1 |
20170358111 | Madsen | Dec 2017 | A1 |
20180026878 | Zahavi et al. | Jan 2018 | A1 |
20180062990 | Kumar et al. | Mar 2018 | A1 |
20180089127 | Flajslik et al. | Mar 2018 | A1 |
20180139132 | Edsall et al. | May 2018 | A1 |
20180302288 | Schmatz | Oct 2018 | A1 |
20200042667 | Swaminathan et al. | Feb 2020 | A1 |
20200067822 | Malhotra et al. | Feb 2020 | A1 |
20200136956 | Neshat | Apr 2020 | A1 |
20200234146 | Lee | Jul 2020 | A1 |
20220014607 | Pilnik et al. | Jan 2022 | A1 |
20220045972 | Aibester et al. | Feb 2022 | A1 |
20220078104 | Yallouz et al. | Mar 2022 | A1 |
20220086848 | Sharma | Mar 2022 | A1 |
20220103480 | Chiesa | Mar 2022 | A1 |
20220182309 | Bataineh et al. | Jun 2022 | A1 |
20220360511 | Raindel | Nov 2022 | A1 |
20230038307 | Blendin | Feb 2023 | A1 |
20230209406 | Hu | Jun 2023 | A1 |
20230239233 | Grandhye | Jul 2023 | A1 |
20230318980 | Wei | Oct 2023 | A1 |
Number | Date | Country |
---|---|---|
1394053 | Jan 2003 | CN |
105141512 | Dec 2015 | CN |
110719193 | Jan 2020 | CN |
11549927 | Dec 2022 | CN |
2012037494 | Mar 2012 | WO |
2015175567 | Nov 2015 | WO |
2016014362 | Jan 2016 | WO |
2016105446 | Jun 2016 | WO |
Entry |
---|
U.S. Appl. No. 17/353,869 Office Action dated Jan. 12, 2023. |
Leiserson, “Fat-Trees: Universal Networks for Hardware Efficient Supercomputing”, IEEE Transactions on Computers, vol. C-34, pp. 892-901, Oct. 1985. |
Oehring et al., “On Generalized Fat Trees”, Proceedings of the 9th International Symposium on Parallel Processing, Santa Barbara, USA, pp. 37-44, Apr. 1995. |
Zahavi, “D-Mod-K Routing Providing Non-Blocking Traffic for Shift Permutations on Real Life Fat Trees”, CCIT Technical Report #776, Technion—Israel Institute of Technology, Haifa, Israel, pp. 1-7, Aug. 2010. |
Yuan et al., “Oblivious Routing for Fat-Tree Based System Area Networks with Uncertain Traffic Demands”, Proceedings of ACM SIGMETRICS—the International Conference on Measurement and Modeling of Computer Systems, pp. 337-348, San Diego, USA, pp. 337-348, Jun. 2007. |
Matsuoka, “You Don't Really Need Big Fat Switches Anymore—Almost”, IPSJ SIG Technical Reports, vol. 2003, No. 83, pp. 157-162, year 2003. |
Kim et al., “Technology-Driven, Highly-Scalable Dragonfly Topology”, 35th International Symposium on Computer Architecture, pp. 77-78, Beijing, China, pp. 77-88, Jun. 2008. |
Jiang et al., “Indirect Adaptive Routing on Large Scale Interconnection Networks”, 36th International Symposium on Computer Architecture, Austin, USA, pp. 220-231, Jun. 2009. |
Minkenberg et al., “Adaptive Routing in Data Center Bridges”, Proceedings of 17th IEEE Symposium on High Performance Interconnects, New York, USA, pp. 33-41, Aug. 2009. |
Kim et al., “Adaptive Routing in High-Radix Clos Network”, Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC2006), Tampa, USA, pp. 1-11, Nov. 2006. |
Infiniband Trade Association, “InfiniBandTM Architecture Specification”, vol. 1, Release 1.3, pp. 1-1842, Mar. 3, 2015. |
Culley et al., “Marker PDU Aligned Framing for TCP Specification”, IETF Network Working Group, RFC 5044, pp. 1-74, Oct. 2007. |
Shah et al., “Direct Data Placement over Reliable Transports”, IETF Network Working Group, RFC 5041, pp. 1-38, Oct. 2007. |
Martinez et al., “Supporting fully adaptive routing in Infiniband networks”, Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'03), pp. 1-10, Apr. 2003. |
Joseph, “Adaptive routing in distributed decentralized systems: NeuroGrid, Gnutella & Freenet”, Proceedings of Workshop on Infrastructure for Agents, MAS and Scalable MAS, Montreal, Canada, pp. 1-11, year 2001. |
Gusat et al., “R3C2: Reactive Route & Rate Control for CEE”, Proceedings of 18th IEEE Symposium on High Performance Interconnects, New York, USA, pp. 50-57, Aug. 2010. |
Wu et al., “DARD: Distributed adaptive routing datacenter networks”, Proceedings of IEEE 32nd International Conference Distributed Computing Systems, pp. 32-41, Jun. 2012. |
Ding et al., “Level-wise scheduling algorithm for fat tree interconnection networks”, Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC 2006), pp. 1-9, Nov. 2006. |
Prisacari et al., “Performance implications of remote-only load balancing under adversarial traffic in Dragonflies”, Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip, pp. 1-4, Jan. 2014. |
Li et al., “Multicast Replication Using Dual Lookups in Large Packet-Based Switches”, 2006 IET International Conference on Wireless, Mobile and Multimedia Networks, pp. 1-3, Nov. 2006. |
Nichols et al., “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers”, Network Working Group, RFC 2474, pp. 1-20, Dec. 1998. |
Microsoft., “How IPv4 Multicasting Works”, pp. 1-22, Mar. 28, 2003. |
Suchara et al., “Network Architecture for Joint Failure Recovery and Traffic Engineering”, Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, pp. 97-108, Jun. 2011. |
IEEE 802.1Q, “IEEE Standard for Local and metropolitan area networks Virtual Bridged Local Area Networks”, IEEE Computer Society, pp. 1-303, May 19, 2006. |
Plummer, D., “An Ethernet Address Resolution Protocol,” Network Working Group, Request for Comments (RFC) 826, pp. 1-10, Nov. 1982. |
Hinden et al., “IP Version 6 Addressing Architecture,” Network Working Group ,Request for Comments (RFC) 2373, pp. 1-26, Jul. 1998. |
Garcia et al., “On-the-Fly 10 Adaptive Routing in High-Radix Hierarchical Networks,” Proceedings of the 2012 International Conference on Parallel Processing (ICPP), pp. 279-288, Sep. 2012. |
Dally et al., “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks”, IEEE Transactions on Computers, vol. C-36, No. 5, pp. 547-553, May 1987. |
Nkposong et al., “Experiences with BGP in Large Scale Data Centers:Teaching an old protocol new tricks”, pp. 1-44, Jan. 31, 2014. |
“Equal-cost multi-path routing”, WIKIPEDIA, pp. 1-2, Oct. 13, 2014. |
Thaler et al., “Multipath Issues in Unicast and Multicast Next-Hop Selection”, Network Working Group, RFC 2991, pp. 1-9, Nov. 2000. |
Glass et al., “The turn model for adaptive routing”, Journal of the ACM, vol. 41, No. 5, pp. 874-902, Sep. 1994. |
Mahalingam et al., “VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks”, Internet Draft, pp. 1-20, Aug. 22, 2012. |
Sinha et al., “Harnessing TCP's Burstiness with Flowlet Switching”, 3rd ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets), pp. 1-6, Nov. 11, 2004. |
Vishnu et al., “Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective”, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07), pp. 1-8, year 2007. |
NOWLAB—Network Based Computing Lab, pp. 1-2, years 2002-2015, as downloaded from http://nowlab.cse.ohio-state.edu/publications/conf-presentations/2007/vishnu-ccgrid07.pdf. |
Alizadeh et al., “CONGA: Distributed Congestion-Aware Load Balancing for Datacenters”, Cisco Systems, pp. 1-12, Aug. 9, 2014. |
Geoffray et al., “Adaptive Routing Strategies for Modern High Performance Networks”, 16th IEEE Symposium on High Performance Interconnects (HOTI '08), pp. 165-172, Aug. 2008. |
Anderson et al., “On the Stability of Adaptive Routing in the Presence of Congestion Control”, IEEE INFOCOM, pp. 1-11, year 2003. |
Perry et al., “Fastpass: A Centralized “Zero-Queue” Datacenter Network”, M.I.T. Computer Science & Artificial Intelligence Lab, pp. 1-12, year 2014. |
Afek et al., “Sampling and Large Flow Detection in SDN”, SIGCOMM '15, London, UK, pp. 345-346, Aug. 2015. |
Amante et al., “IPv6 Flow Label Specification”, Request for Comments: 6437, pp. 1-15, Nov. 2011. |
Cao et al., “Implementation Method for High-radix Fat-tree Deterministic Source-routing Interconnection Network”, Computer Science, vol. 39, issue 12, pp. 33-37, year 2012. |
Shpiner et al., “Dragonfly+: Low Cost Topology for Scaling Datacenters”, IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pp. 1-9, Feb. 2017. |
Zahavi et al., “Distributed Adaptive Routing for Big-Data Applications Running on Data Center Networks,” Proceedings of the Eighth ACM/IEEE Symposium on Architectures for Networking and Communication Systems, New York, USA, pp. 99-110, Oct. 2012. |
MELLANOX White Paper, “The SHIELD: Self-Healing Interconnect, ” pp. 1-2, year 2019. |
Ronen et al., U.S. Appl. No. 17/353,869, filed Jun. 22, 2021. |
Valadarsky et al., “Xpander: Towards Optimal-Performance Datacenters,” Proceedings of CoNEXT '16, pp. 205-219, Dec. 2016. |
Bilu et al., “Lifts, Discrepancy and Nearly Optimal Spectral Gap*,” Combinatorica, vol. 26, No. 5, Bolyai Society—Springer-Verlag, pp. 495-519, year 2006. |
Zhao et al., “Recovery Strategy from Network Multi-link Failures Based on Overlay Network Constructing Technique,” Bulletin of Science and Technology, vol. 32, No. 10, pp. 170-239, Oct. 2016. |
CN Application # 202210593405.5 Office Action dated Sep. 15, 2023. |
Nkposong et al., “Experiences with BGP in Large Scale Data Centers: Teaching an Old Protocol New Tricks”, pp. 1-47, JANOG33 Meeting (Japan Network Operators' Group), Beppu City, Japan, Jan. 23-24, 2014. |
U.S. Appl. No. 17/539,252 Office Action dated Apr. 26, 2023. |
U.S. Appl. No. 17/353,869 Office Action dated Jun. 9, 2023. |
Infiniband Trade Association, “Supplement to Infiniband Architecture Specification,” vol. 1, release 1.2.1—Annex A17: RoCEv2, pp. 1-23, Sep. 2, 2014. |
Infiniband Trade Association, “InfiniBand Architecture Specification,” vol. 1, Release 1.5, Jun. 2, 2021, Draft, Table 6 (Base Transport Header Fields), pp. 1-2, year 2021. |
Cisco, “Cisco ACI Remote Leaf Architecture—White Paper,” pp. 1-83, updated Jan. 22, 2020. |
Gandelman et al., U.S. Appl. No. 17/539,252, filed Dec. 1, 2021. |
Thulasiraman et al., “Logical Topology Augmentation for Guaranteed Survivability Under Multiple Failures in IP-over-WDM Optical Network, ” 2009 IEEE 3rd International Symposium on Advanced Networks and Telecommunication Systems (ANTS), pp. 1-3, year 2009. |
Nastiti et al., “Link Failure Emulation with Dijkstra and Bellman-Ford Algorithm in Software Defined Network Architecture,” Abstract of Case Study: Telkom University—Topology, 2018 6th IEEE Conference on Information and Communication Technology (ICoICT), pp. 135-140, year 2018. |
Kamiyama et al., “Network Topology Design Considering Detour Traffic Caused by Link Failure,” Networks 2008—The 13th International Telecommunications Network Strategy and Planning Symposium, pp. 1-8, year 2008. |
Almog et al., U.S. Appl. No. 17/990,686, filed Nov. 20, 2022. |
Number | Date | Country | |
---|---|---|---|
20240080266 A1 | Mar 2024 | US |