TECHNOLOGIES FOR MEDIUM GRAINED ADAPTIVE ROUTING IN HIGH-PERFORMANCE NETWORK FABRICS

Description

BACKGROUND

High performance computing (HPC) clusters, cloud computing datacenters, and other large-scale computing networks may communicate over a high-speed input/output fabric such as an InfiniBand™ fabric. The InfiniBand™ architecture may transfer data using switched, point-to-point channels between endnodes. In the InfiniBand™ architecture, an endnode may be identified within a subnet using a 16-bit local identifier (LID). Routing in InfiniBand™ networks is distributed, based on forwarding tables stored in each switch. The forwarding table of an InfiniBand™ switch may store a single destination port per destination LID. Therefore, routing in InfiniBand™ may be static and deterministic.

Congestion in network communications may occur when demand for a network link exceeds available bandwidth or other network resources. In InfiniBand™ networks, a congestion control agent may monitor for network congestion and communicate with network hosts to reduce data injection rates for network traffic causing congestion.

BRIEF DESCRIPTION OF THE DRAWINGS

The concepts described herein are illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not necessarily drawn to scale. Where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.

FIG. 1 is a simplified block diagram of at least one embodiment of a system for medium grained adaptive routing;

FIG. 2 is a simplified block diagram of at least one embodiment of various environments that may be established by the system of FIG. 1;

FIG. 3 is a simplified flow diagram of at least one embodiment of a method for medium grained adaptive routing that may be executed by a managed network device of the system of FIGS. 1 and 2; and

FIG. 4 is a schematic diagram of various data tables that may be maintained by a managed network device of FIGS. 1 and 2.

DETAILED DESCRIPTION OF THE DRAWINGS

While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described herein in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.

References in the specification to “one embodiment,” “an embodiment,” “an illustrative embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described. Additionally, it should be appreciated that items included in a list in the form of “at least one of A, B, and C” can mean (A); (B); (C): (A and B); (A and C); (B and C); or (A, B, and C). Similarly, items listed in the form of “at least one of A, B, or C” can mean (A); (B); (C): (A and B); (A and C); (B and C); or (A, B, and C).

The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device).

In the drawings, some structural or method features may be shown in specific arrangements and/or orderings. However, it should be appreciated that such specific arrangements and/or orderings may not be required. Rather, in some embodiments, such features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or method feature in a particular figure is not meant to imply that such feature is required in all embodiments and, in some embodiments, may not be included or may be combined with other features.

Referring now to FIG. 1, in one embodiment, a system 100 for medium grained adaptive routing includes a number of managed network devices 102 and a number of computing nodes 104 communicating via several fabric links 106. The managed network devices 102, the computing nodes 104, and other attributes of the system 100 may be managed by one or more fabric managers 108. In use, as discussed in more detail below, each computing node 104 may transmit data packets over a fabric link 106 to a managed network device 102. Each data packet may include a destination local identifier (DLID) that identifies a destination computing node 104. The managed network device 102 examines the DLID to determine whether the fabric link 106 of the statically routed destination port of the data packet is congested. Congestion typically occurs for fabric links 106 between managed network devices 102, rather than between a computing node 104 and a managed network device 102. If congested, the managed network device 102 may look up a port group for the DLID and select a new destination port of the port group. Each port group includes two or more ports of the managed network device 102, and the port groups may overlap (i.e., multiple port groups may include the same port). The fabric manager 108 may configure the port groups and port group forwarding tables of each managed network device 102 of the system 100. Thus, in the system 100, the fabric manager 108 may provide global knowledge regarding fabric topology and usage policy while real time traffic monitoring and adaptive routing is performed by each managed network device 102. Thus, adaptive routing provided by the system 100 may support scaling to large numbers of managed network devices 102. Accordingly, although illustrated as including three computing nodes 104a through 104c and two managed network devices 102a and 102b, it should be understood that the system 100 may include many more managed network devices 102 and computing nodes 104.

Each managed network device 102 may be embodied as any network device capable of forwarding or controlling fabric traffic, such as a managed switch. The illustrative managed network device 102 includes a number of fabric ports 120, a switch logic 122, and a management logic 124. Each fabric port 120 may be connected to a fabric link 106, which in turn may be connected to a remote device such as a computing node 104 or another managed network device 102. The illustrative managed network device 102 includes three fabric ports 120a through 120c; however, in other embodiments the managed network device 102 may include additional or fewer ports 120 to support a different number of fabric links 106.

The switch logic 122 may be embodied as any hardware, firmware, software, or combination thereof configured to forward data packets received on the ports 120 to appropriate destination ports 120. For example, the switch logic 122 may be embodied as a shared memory switch or a crossbar switch, and may include a scheduler, packet processing pipeline, linear forwarding tables, port group forwarding tables, port group tables, and/or any other switching logic. In some embodiments, the switch logic 122 may be embodied as one or more application-specific integrated circuits (ASICs).

The management logic 124 may be embodied as any control circuit, microprocessor, or other logic block that may be used to configure and control the managed network device 102. For example, the management logic 124 may initialize the managed network device 102 and its components, control the configuration of the managed network device 102 and its components, provide a testing interface to the managed network device 102, or provide other management functions. The management logic 124 may be configured by changing the values of a number of data tables including a port group forwarding table and/or a port group table. The fabric manager 108 may communicate with the management logic 124 using an in-band management interface by transmitting specially formatted management datagrams (MADs) over the fabric links 106. Additionally or alternatively, the management logic 124 may communicate with the fabric manager 108 over a management interface such as one or more PCI Express host interfaces, a test interface, or one or more low-speed interfaces such as an I2C interface, a JTAG interface, an SPI interface, an MDIO interface, an LED interface, or a GPIO interface.

Each computing node 104 may be embodied as any type of computation or computer device capable of performing the functions described herein, including, without limitation, a computer, a server, a rack-mounted server, a blade server, a network appliance, a web appliance, a multiprocessor system, a distributed computing system, a processor-based system, a mobile computing device, and/or a consumer electronic device. As shown in FIG. 1, each computing node 104 illustratively includes a processor 140, an input/output subsystem 144, a memory 146, a data storage device 148, and communication circuitry 150. Of course, the computing node 104 may include other or additional components, such as those commonly found in a computer (e.g., various input/output devices), in other embodiments. Additionally, in some embodiments, one or more of the illustrative components may be incorporated in, or otherwise form a portion of, another component. For example, the memory 146, or portions thereof, may be incorporated in the processor 140 in some embodiments.

The processor 140 may be embodied as any type of processor capable of performing the functions described herein. For example, the processor 140 may be embodied as a single or multi-core processor(s), digital signal processor, microcontroller, or other processor or processing/controlling circuit. The processor 140 further includes a host fabric interface 142. The host fabric interface 142 may be embodied as any communication interface, such as a network interface controller, communication circuit, device, or collection thereof, capable of enabling communications between the processor 140 and other remote computing nodes 104 and/or other remote devices over the fabric links 106. The host fabric interface 142 may be configured to use any one or more communication technology and associated protocols (e.g., the Intel® Omni-Path Architecture) to effect such communication. Although illustrated as including a single processor 140, it should be understood that each computing node 104 may include multiple processors 140, and each processor 140 may include an integrated host fabric interface 142.

Similarly, the memory 146 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. In operation, the memory 146 may store various data and software used during operation of the computing node 104 such as operating systems, applications, programs, libraries, and drivers. The memory 146 is communicatively coupled to the processor 140 via the I/O subsystem 144, which may be embodied as circuitry and/or components to facilitate input/output operations with the processor 140, the memory 146, and other components of the computing node 104. For example, the I/O subsystem 144 may be embodied as, or otherwise include, memory controller hubs, input/output control hubs, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.) and/or other components and subsystems to facilitate the input/output operations. In some embodiments, the I/O subsystem 144 may form a portion of a system-on-a-chip (SoC) and be incorporated, along with the processor 140, the memory 146, and other components of the computing node 104, on a single integrated circuit chip. The data storage device 148 may be embodied as any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, or other data storage devices.

The communication circuitry 150 of the computing node 104 may be embodied as any communication interface, such as a communication circuit, device, or collection thereof, capable of enabling communications between the computing node 104 and one or more remote computing nodes 104, managed network devices 102, switches, remote hosts, or other devices. The communication circuitry 150 may be configured to use any one or more communication technology (e.g., wired or wireless communications) and associated protocols (e.g., Intel® Omni-Path Architecture, InfiniBand®, Ethernet, Bluetooth®, Wi-Fi®, WiMAX, etc.) to effect such communication. In particular, the communication circuitry 150 includes a port 152 that connects to a fabric link 106. Although illustrated as including a single port 152, in some embodiments each computing node 104 may include multiple ports 152.

Each of the fabric links 106 may be embodied as any point-to-point communication link capable of connecting two ports 120, 152 of the system 100. For example, a fabric link 106 may connect a port 152 of a computing node 104 with a port 120 of a managed network device 102, may connect two ports 120 of two managed network devices 102, and so on. Each fabric link 106 allows communications in both directions. Each fabric link 106 may be embodied as a serial data communication link such as a copper cable, copper backplane, fiber optic cable, or silicon photonics link, and may include multiple communication lanes (e.g., four lanes) to increase total bandwidth. Each fabric link 106 may signal data at a wire speed such as 12.5 Gb/s or 25.78125 Gb/s.

The fabric manager 108 is configured to initialize and otherwise manage the managed network devices 102, computing nodes 104, and other hosts, gateways, and/or other devices of the system 100. The fabric manager 108 may be embodied as any type of server computing device, network device, or collection of devices, capable of performing the functions described herein. In some embodiments, the system 100 may include multiple fabric managers 108 of which a primary fabric manager 108 may be selected. As such, the fabric manager 108 may be embodied as a single server computing device or a collection of servers and associated devices. Accordingly, although the fabric manager 108 is illustrated in FIG. 1 as embodied as a single computing device, it should be appreciated that the fabric manager 108 may be embodied as multiple devices cooperating together to facilitate the functionality described below.

Referring now to FIG. 2, in an illustrative embodiment, each managed network device 102 establishes an environment 200 during operation. The illustrative environment 200 includes a packet ingress module 202, a static route module 204, a congestion monitoring module 206, an adaptive route module 208, and a management module 210. The various modules of the environment 200 may be embodied as hardware, firmware, software, or a combination thereof. For example the various modules, logic, and other components of the environment 200 may form a portion of, or otherwise be established by, the switch logic 122, the management logic 124, or other hardware components of the managed network device 102. As such, in some embodiments, any one or more of the modules of the environment 200 may be embodied as a circuit or collection of electrical devices (e.g., a packet ingress circuit, a static route circuit, etc.).

The packet ingress module 202 is configured to receive and process data packets from the ports 120. In particular, the packet ingress module 202 is configured to extract a destination local identifier (DLID) from a received data packet. The DLID may be embodied as a binary value having a configurable length (e.g., 32, 24, 20, or 16 bits wide, or any other appropriate width). The DLID identifies the destination end point (e.g., a destination computing node 104) of the data packet.

The static route module 204 is configured to determine a statically routed destination port 120 of the managed network device 102 as a function of the DLID. The static route module 204 may, for example, look up the destination port 120 in a forwarding table using the DLID. The static route module 204 may be configured to forward the data packet to the statically routed destination port 120 if that destination port 120 is not congested.

The congestion monitoring module 206 is configured to determine whether the statically routed destination port 120 is congested. The congestion monitoring module 206 may use any appropriate congestion metric or other monitoring technique to determine whether the destination port 120 is congested. In some embodiments, the congestion monitoring module 206 may determine whether a particular subdivision of fabric link 106 for the destination port 120 is congested (e.g., a particular virtual lane, service channel, or associated service level).

The adaptive route module 208 is configured to determine a port group based on the DLID. Each port group identifies two or more ports 120 of the managed network device 102. Port groups may overlap, and each DLID is associated with exactly one port group. The adaptive route module 208 is further configured to dynamically select a destination port 120 of the port group when the statically routed destination port 120 is congested and then forward the data packet to dynamically selected destination port 120. The adaptive route module 208 may use any one or more strategies for selecting the destination port 120 (e.g., random selection, greedy/least-loaded selection, and/or greedy random selection).

The management module 210 is configured to manage the configuration of the managed network device 102. The management module 210 may store or otherwise manage one or more configuration registers, data tables, or other management information that may be used to configure the managed network device 102. For example, in some embodiments, the management module 210 may manage a linear forwarding table, a multicast forwarding table, a port group forwarding table, and/or a port group table. The management module 210 may be configured to receive commands, data, and other management information from the fabric manager 108.

Referring now to FIG. 3, in use, a managed network device 102 may execute a method 300 for medium grained adaptive routing. The method 300 begins in block 302, in which the managed network device 102 receives a data packet on a port 120 and extracts a destination local identifier (DLID) from the data packet. The DLID identifies the destination of the data packet (e.g., the destination computing node 104). The data packet may be embodied as any appropriate collection of binary data including the DLID. The DLID may be embodied as a binary value having a configurable length (e.g., 32, 24, 20, or 16 bits). The managed network device 102 may be configured to extract the correctly sized DLID, for example by the fabric manager 108. The width of the DLID may be configured globally for the managed network device 102 and/or on a per-port 120 basis. In some embodiments, a subset of possible DLIDs may be assigned to a multicast and/or collective address space, and certain DLIDs may have predetermined meanings (e.g., an uninitialized address or a permissive LID). Those DLIDs may be processed using dedicated multicast, collective, or other operations, which are not shown in the method 300 for clarity.

In block 304, the managed network device 102 determines the statically routed destination port 120 based on the DLID. The statically-routed destination port 120 may be a predetermined destination port 120 of the managed network device 102 that has been associated with the DLID. The managed network device 102 may look up the statically routed destination port 120 in one or more data tables. Those data tables may be configured or otherwise maintained by the fabric manager 108. In some embodiments, in block 306 the managed network device 102 may look up the destination port 120 in a linear forwarding table. The managed network device 102 may use the DLID as an index into the linear forwarding table and retrieve a port number or other data identifying the destination port 120.

In block 308, the managed network device 102 determines whether the statically routed destination port 120 is congested. The destination port 120 may be congested if the offered load on that port 120 exceeds the ejection rate of the receiver on the other side of the fabric link 106 (e.g., the receiving managed network device 102 or computing node 104). The managed network device 102 may use any monitoring technique to determine whether the destination port 120 is congested. For example, the managed network device 102 may use a congestion control agent, monitor for congestion notices received from remote devices, analyze flow control data, or perform any other appropriate monitoring. The managed network device 102 may determine whether the destination port 120 is congested on a per virtual lane basis, per service channel basis, per service level basis, or based on any other logical or physical subdivision of the fabric link 106. In some embodiments, in block 310 the managed network device 102 analyzes available flow control credits at the receiver and pending flow control credits to be transmitted by the managed network device 102. If flow control credits are not available at the receiver or pending flow control credits of the managed network device 102 are increasing, then the destination port 120 may be congested. In some embodiments, in block 312 the managed network device 102 may analyze a congestion log for congestion marking events. In some embodiments, in response to detecting congestion, the managed network device 102 may send a Forward Explicit Congestion Notification (FECN) to the receiver when congestion is detected, for example, by setting an FECN bit on data packets exiting the managed network device 102. When marking a data packet with the FECN bit, the managed network device 102 may also record that marking event in the congestion log.

In block 314, the managed network device 102 determines whether the statically routed destination port 120 is congested. If not, the method 300 branches ahead to block 332, described below. If the statically routed destination port 120 is congested, the method 300 advances to block 316.

In block 316, the managed network device 102 determines a destination port group based on the DLID. The destination port group may be identified as a collection of any two or more destination ports 120 of the managed network device 102. Destination port groups may overlap, meaning that a port 120 may be included in more than one port group. As further described below, each port group may map to one or more DLIDs, and each DLID is associated with exactly one port group. The fabric manager 108 may discover routes through the fabric and then configure the port groups and port group mappings accordingly. When there is only one possible path through the fabric for a particular DLID (e.g., a single destination port 120), that DLID may be assigned to an undefined port group (e.g., an empty set, null value, zero value, etc.).

In some embodiments, in block 318, the managed network device 102 may look up the port group in a port group forwarding table. For example, the managed network device 102 may index the port group forwarding table using the DLID to identify the unique port group identifier of the destination port group. The port group forwarding table may have a similar structure to the linear forwarding table, and may be accessed or otherwise maintained by the fabric manger 108 similarly to the linear forwarding table. Referring now to FIG. 4, a schematic diagram 400 illustrates one potential embodiment of a port group forwarding table 402. In the illustrative embodiment, the table 402 may be indexed by a DLID lid_ito generate the corresponding port group identifier pg_id_i. Thus, in the illustrative embodiment, each DLID may be used to retrieve exactly one port group identifier, but each port group identifier may referenced by more than one DLID. Each DLID may be embodied as, for example, a 16-bit binary value, and each port group identifier may be embodied as an eight-bit binary value. The undefined port group may have the port group identifier zero (0x00), and valid port groups may have identifiers from one to 255 (0x01 to 0xFF). In some embodiments, the managed network device 102 may support less than the potential maximum of 255 port groups; however, in many embodiments the managed network device 102 may support at least twice as many port groups as the managed network device 102 has ports 120.

Referring back to FIG. 3, in block 320, the managed network device 102 determines a dynamic destination port 120 from the port group associated with the DLID. The managed network device 102 may use any strategy for selecting the dynamic destination port 120. In some embodiments, the managed network device 102 may ensure that the dynamic destination port 120 is not the same as the statically routed destination port 120, to avoid congestion. In some embodiments, the managed network device 102 may look up the destination ports 120 in a port group table. For example, the managed network device 102 may index the port group table using the port group identifier to identify a port mask associated with the port group. The port mask may be embodied as a bitmask or other data identifying the destination ports 120 of the managed network device 102 included in the port group. The port group table and port mask entries may have a similar structure to a multicast forwarding table, and may be accessed or otherwise maintained by the fabric manger 108 similarly to the multicast forwarding table. Referring again to FIG. 4, the schematic diagram 400 illustrates one potential embodiment of a port group table 404. As shown, the port group identifier retrieved from the port group forwarding table 402 may be used to index the port group table 404. In the illustrative embodiment, the port group identifier may be embodied as an eight-bit value. The identifier zero (0x00) maps to the undefined port group. Valid port group identifiers one through 255 (0x01 through 0xFF) each map to a port mask, which is illustrated as a 256-bit bitmask p₂₅₅, p₂₅₄. . . p₁p₀. Each bit p, of the port mask corresponds to a destination port 120 of the managed network device 102. Thus, each the ports 120 included in each port group may correspond to the bits set in the associated port mask. Bits of the bitmask beyond the number of ports 120 in the managed network device 102 may be ignored on write and read back as zero, or may be otherwise disregarded. Although illustrated as a 256-bit bitmap, it should be understood that the port masks may include any appropriate amount of data.

Referring back to FIG. 3, as described above, the managed network device 102 may use any strategy for selecting the destination port 120 from the port group. In some embodiments, in block 324 the managed network device 102 may randomly select a destination port 120 from the port group. In some embodiments, in block 326, the managed network device 102 may select the least-loaded destination port 120 from the port group. The managed network device 102 may use any appropriate metric to determine the least-loaded port 120, for example selecting the destination port 120 with the smallest queue occupancy/depth, the least congestion, or otherwise least-loaded. In some embodiments, in block 328, the managed network device 102 may randomly select from two or more of the least-loaded ports 120 of the port group.

In block 330, the managed network device 102 updates the static routing information with the dynamic destination port 120. The managed network device 102 may, for example, replace the entry for the statically routed destination port 120 in the linear forwarding table with the dynamically determined destination port 120.

In block 332, the managed network device 102 forwards the data packet to the destination port 120. The managed network device 102 may, for example, forward the data packet to the destination port 120 described in the linear forwarding table. As described above, the destination port 120 described in the linear forwarding table may be the statically routed destination port 120 determined as described above in block 304 if that port 120 is not congested, or the dynamically determined destination port 120 as described above in connection with block 320. After forwarding the data packet to the destination port 120, the method 300 loops back to block 302 to continue processing data packets. Although the method of 300 of FIG. 3 is illustrated as executing sequentially, it should be understood that in some embodiments, the managed network device 102 may perform the operations of the method 300 in parallel, simultaneously, or in any other order. For example, in some embodiments operations may be performed in parallel by hardware resources of the managed network device 102.

EXAMPLES

Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any one or more, and any combination of, the examples described below.

Example 1 includes a network device for data packet forwarding, the network device comprising a packet ingress module to extract a destination local identifier (DLID) from a data packet; a static route module to determine a statically routed destination port of the network device as a function of the DLID; a congestion monitoring module to determine whether the statically routed destination port is congested; and an adaptive route module to determine a port group as a function of the DLID in response to a determination that the statically routed destination port is congested, wherein the port group identifies two or more ports of the network device; select a dynamic destination port of the port group in response to the determination that the statically routed destination port is congested; and forward the data packet to the dynamic destination port in response to the determination that the statically routed destination port is congested.

Example 2 includes the subject matter of Example 1, and wherein the DLID comprises a binary value that is 32, 24, 20, or 16 bits long.

Example 3 includes the subject matter of any of Examples 1 and 2, and wherein to determine the statically routed destination port comprises to index a linear forwarding table with the DLID to determine the statically routed destination port.

Example 4 includes the subject matter of any of Examples 1-3, and wherein to determine whether the statically routed destination port is congested comprises to analyze available flow control credits associated with the destination port.

Example 5 includes the subject matter of any of Examples 1-4, and wherein to determine whether the statically routed destination port is congested comprises to analyze a congestion log associated with the destination port.

Example 6 includes the subject matter of any of Examples 1-5, and wherein to determine the port group as a function of the DLID comprises to determine a port group identifier, wherein the port group identifier includes an integer value between 1 and 255, inclusive.

Example 7 includes the subject matter of any of Examples 1-6, and wherein to determine the port group as a function of the DLID comprises to index a port group forwarding table with the DLID to determine a port group identifier.

Example 8 includes the subject matter of any of Examples 1-7, and wherein to select the dynamic destination port of the port group comprises to index a port group table with the port group identifier to determine a port group mask, wherein the port group mask is indicative of a plurality of valid destination ports for the DLID; and select the dynamic destination port from the plurality of valid destination ports of the port group mask.

Example 9 includes the subject matter of any of Examples 1-8, and wherein the port group mask comprises a binary value that includes 256 bits, and wherein each bit of the port group mask is associated with a corresponding port of the network device.

Example 10 includes the subject matter of any of Examples 1-9, and wherein to select the dynamic destination port from the plurality of valid destination ports comprises to randomly select the dynamic destination port from the plurality of valid destination ports.

Example 11 includes the subject matter of any of Examples 1-10, and wherein to select the dynamic destination port from the plurality of valid destination ports comprises to select a least-loaded destination port of the plurality of valid destination ports as the dynamic destination port.

Example 12 includes the subject matter of any of Examples 1-11, and wherein to select the dynamic destination port from the plurality of valid destination ports comprises to randomly select the dynamic destination port from a plurality of least-loaded destination ports of the plurality of valid destination ports.

Example 13 includes the subject matter of any of Examples 1-12, and wherein the static route module is further to forward the data packet to the statically routed destination port in response to a determination that the statically routed destination port is not congested.

Example 14 includes a method for adaptive data packet routing, the method comprising extracting, by a network device, a destination local identifier (DLID) from a data packet; determining, by the network device, a statically routed destination port of the network device as a function of the DLID; determining, by the network device, whether the statically routed destination port is congested; determining, by the network device, a port group as a function of the DLID in response to determining the statically routed destination port is congested, wherein the port group identifies two or more ports of the network device; selecting, by the network device, a dynamic destination port of the port group in response to determining the statically routed destination port is congested; and forwarding, by the network device, the data packet to the dynamic destination port in response to determining the statically routed destination port is congested.

Example 15 includes the subject matter of Example 14, and wherein the DLID comprises a binary value that is 32, 24, 20, or 16 bits long.

Example 16 includes the subject matter of any of Examples 14 and 15, and wherein determining the statically routed destination port comprises indexing a linear forwarding table with the DLID to determine the statically routed destination port.

Example 17 includes the subject matter of any of Examples 14-16, and wherein determining whether the statically routed destination port is congested comprises analyzing available flow control credits associated with the destination port.

Example 18 includes the subject matter of any of Examples 14-17, and wherein determining whether the statically routed destination port is congested comprises analyzing a congestion log associated with the destination port.

Example 19 includes the subject matter of any of Examples 14-18, and wherein determining the port group as a function of the DLID comprises determining a port group identifier, wherein the port group identifier includes an integer value between 1 and 255, inclusive.

Example 20 includes the subject matter of any of Examples 14-19, and wherein determining the port group as a function of the DLID comprises indexing a port group forwarding table with the DLID to determine a port group identifier.

Example 21 includes the subject matter of any of Examples 14-20, and wherein selecting the dynamic destination port of the port group comprises indexing a port group table with the port group identifier to determine a port group mask, wherein the port group mask is indicative of a plurality of valid destination ports for the DLID; and selecting the dynamic destination port from the plurality of valid destination ports of the port group mask.

Example 22 includes the subject matter of any of Examples 14-21, and wherein the port group mask comprises a binary value including 256 bits, and wherein each bit of the port group mask is associated with a corresponding port of the network device.

Example 23 includes the subject matter of any of Examples 14-22, and wherein selecting the dynamic destination port from the plurality of valid destination ports comprises randomly selecting the dynamic destination port from the plurality of valid destination ports.

Example 24 includes the subject matter of any of Examples 14-23, and wherein selecting the dynamic destination port from the plurality of valid destination ports comprises selecting a least-loaded destination port of the plurality of valid destination ports as the dynamic destination port.

Example 25 includes the subject matter of any of Examples 14-24, and wherein selecting the dynamic destination port from the plurality of valid destination ports comprises randomly selecting the dynamic destination port from a plurality of least-loaded destination ports of the plurality of valid destination ports.

Example 26 includes the subject matter of any of Examples 14-25, and further comprising forwarding, by the network device, the data packet to the statically routed destination port in response to determining the statically routed destination port is not congested.

Example 27 includes a computing device comprising a processor; and a memory having stored therein a plurality of instructions that when executed by the processor cause the computing device to perform the method of any of Examples 14-26.

Example 28 includes one or more machine readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a computing device performing the method of any of Examples 14-26.

Example 29 includes a computing device comprising means for performing the method of any of Examples 14-26.

Example 30 includes a network device for data packet forwarding, the network device comprising means for extracting a destination local identifier (DLID) from a data packet; means for determining a statically routed destination port of the network device as a function of the DLID; means for determining whether the statically routed destination port is congested; means for determining a port group as a function of the DLID in response to determining the statically routed destination port is congested, wherein the port group identifies two or more ports of the network device; means for selecting a dynamic destination port of the port group in response to determining the statically routed destination port is congested; and means for forwarding the data packet to the dynamic destination port in response to determining the statically routed destination port is congested.

Example 31 includes the subject matter of Example 30, and wherein the DLID comprises a binary value that is 32, 24, 20, or 16 bits long.

Example 32 includes the subject matter of any of Examples 30 and 31, and wherein the means for determining the statically routed destination port comprises means for indexing a linear forwarding table with the DLID to determine the statically routed destination port.

Example 33 includes the subject matter of any of Examples 30-32, and wherein the means for determining whether the statically routed destination port is congested comprises means for analyzing available flow control credits associated with the destination port.

Example 34 includes the subject matter of any of Examples 30-33, and wherein the means for determining whether the statically routed destination port is congested comprises means for analyzing a congestion log associated with the destination port.

Example 35 includes the subject matter of any of Examples 30-34, and wherein the means for determining the port group as a function of the DLID comprises means for determining a port group identifier, wherein the port group identifier includes an integer value between 1 and 255, inclusive.

Example 36 includes the subject matter of any of Examples 30-35, and wherein the means for determining the port group as a function of the DLID comprises means for indexing a port group forwarding table with the DLID to determine a port group identifier.

Example 37 includes the subject matter of any of Examples 30-36, and wherein the means for selecting the dynamic destination port of the port group comprises means for indexing a port group table with the port group identifier to determine a port group mask, wherein the port group mask is indicative of a plurality of valid destination ports for the DLID; and means for selecting the dynamic destination port from the plurality of valid destination ports of the port group mask.

Example 38 includes the subject matter of any of Examples 30-37, and wherein the port group mask comprises a binary value including 256 bits, and wherein each bit of the port group mask is associated with a corresponding port of the network device.

Example 39 includes the subject matter of any of Examples 30-38, and wherein the means for selecting the dynamic destination port from the plurality of valid destination ports comprises means for randomly selecting the dynamic destination port from the plurality of valid destination ports.

Example 40 includes the subject matter of any of Examples 30-39, and wherein the means for selecting the dynamic destination port from the plurality of valid destination ports comprises means for selecting a least-loaded destination port of the plurality of valid destination ports as the dynamic destination port.

Example 41 includes the subject matter of any of Examples 30-40, and wherein the means for selecting the dynamic destination port from the plurality of valid destination ports comprises means for randomly selecting the dynamic destination port from a plurality of least-loaded destination ports of the plurality of valid destination ports.

Example 42 includes the subject matter of any of Examples 30-41, and further comprising means for forwarding the data packet to the statically routed destination port in response to determining the statically routed destination port is not congested.

Claims

1-25. (canceled)
26. A network device for data packet forwarding, the network device comprising: a packet ingress module to extract a destination local identifier (DLID) from a data packet;a static route module to determine a statically routed destination port of the network device as a function of the DLID;a congestion monitoring module to determine whether the statically routed destination port is congested; andan adaptive route module to:determine a port group as a function of the DLID in response to a determination that the statically routed destination port is congested, wherein the port group identifies two or more ports of the network device;select a dynamic destination port of the port group in response to the determination that the statically routed destination port is congested; andforward the data packet to the dynamic destination port in response to the determination that the statically routed destination port is congested.
27. The network device of claim 26, wherein the DLID comprises a binary value that is 32, 24, 20, or 16 bits long.
28. The network device of claim 26, wherein to determine whether the statically routed destination port is congested comprises to analyze available flow control credits associated with the destination port.
29. The network device of claim 26, wherein to determine whether the statically routed destination port is congested comprises to analyze a congestion log associated with the destination port.
30. The network device of claim 26, wherein to determine the port group as a function of the DLID comprises to determine a port group identifier, wherein the port group identifier includes an integer value between 1 and 255, inclusive.
31. The network device of claim 26, wherein to determine the port group as a function of the DLID comprises to index a port group forwarding table with the DLID to determine a port group identifier.
32. The network device of claim 31, wherein to select the dynamic destination port of the port group comprises to: index a port group table with the port group identifier to determine a port group mask, wherein the port group mask is indicative of a plurality of valid destination ports for the DLID; andselect the dynamic destination port from the plurality of valid destination ports of the port group mask.
33. The network device of claim 32, wherein the port group mask comprises a binary value that includes 256 bits, and wherein each bit of the port group mask is associated with a corresponding port of the network device.
34. The network device of claim 32, wherein to select the dynamic destination port from the plurality of valid destination ports comprises to randomly select the dynamic destination port from the plurality of valid destination ports.
35. The network device of claim 32, wherein to select the dynamic destination port from the plurality of valid destination ports comprises to select a least-loaded destination port of the plurality of valid destination ports as the dynamic destination port.
36. The network device of claim 32, wherein to select the dynamic destination port from the plurality of valid destination ports comprises to randomly select the dynamic destination port from a plurality of least-loaded destination ports of the plurality of valid destination ports.
37. A method for adaptive data packet routing, the method comprising: extracting, by a network device, a destination local identifier (DLID) from a data packet;determining, by the network device, a statically routed destination port of the network device as a function of the DLID;determining, by the network device, whether the statically routed destination port is congested;determining, by the network device, a port group as a function of the DLID in response to determining the statically routed destination port is congested, wherein the port group identifies two or more ports of the network device;selecting, by the network device, a dynamic destination port of the port group in response to determining the statically routed destination port is congested; andforwarding, by the network device, the data packet to the dynamic destination port in response to determining the statically routed destination port is congested.
38. The method of claim 37, wherein determining the port group as a function of the DLID comprises indexing a port group forwarding table with the DLID to determine a port group identifier.
39. The method of claim 38, wherein selecting the dynamic destination port of the port group comprises: indexing a port group table with the port group identifier to determine a port group mask, wherein the port group mask is indicative of a plurality of valid destination ports for the DLID; andselecting the dynamic destination port from the plurality of valid destination ports of the port group mask.
40. The method of claim 39, wherein selecting the dynamic destination port from the plurality of valid destination ports comprises randomly selecting the dynamic destination port from the plurality of valid destination ports.
41. The method of claim 39, wherein selecting the dynamic destination port from the plurality of valid destination ports comprises selecting a least-loaded destination port of the plurality of valid destination ports as the dynamic destination port.
42. The method of claim 39, wherein selecting the dynamic destination port from the plurality of valid destination ports comprises randomly selecting the dynamic destination port from a plurality of least-loaded destination ports of the plurality of valid destination ports.
43. One or more computer-readable storage media comprising a plurality of instructions that in response to being executed cause a network device to: extract a destination local identifier (DLID) from a data packet;determine a statically routed destination port of the network device as a function of the DLID;determine whether the statically routed destination port is congested;determine a port group as a function of the DLID in response to determining the statically routed destination port is congested, wherein the port group identifies two or more ports of the network device;select a dynamic destination port of the port group in response to determining the statically routed destination port is congested; andforward the data packet to the dynamic destination port in response to determining the statically routed destination port is congested.
44. The one or more computer-readable storage media of claim 43, wherein to determine the port group as a function of the DLID comprises to index a port group forwarding table with the DLID to determine a port group identifier.
45. The one or more computer-readable storage media of claim 44, wherein to select the dynamic destination port of the port group comprises to: index a port group table with the port group identifier to determine a port group mask, wherein the port group mask is indicative of a plurality of valid destination ports for the DLID; andselect the dynamic destination port from the plurality of valid destination ports of the port group mask.
46. The one or more computer-readable storage media of claim 45, wherein to select the dynamic destination port from the plurality of valid destination ports comprises to randomly select the dynamic destination port from the plurality of valid destination ports.
47. The one or more computer-readable storage media of claim 45, wherein to select the dynamic destination port from the plurality of valid destination ports comprises to select a least-loaded destination port of the plurality of valid destination ports as the dynamic destination port.
48. The one or more computer-readable storage media of claim 45, wherein to select the dynamic destination port from the plurality of valid destination ports comprises to randomly select the dynamic destination port from a plurality of least-loaded destination ports of the plurality of valid destination ports.

PCT Information

Filing Document	Filing Date	Country	Kind
PCT/US14/72461	12/27/2014	WO	00

TECHNOLOGIES FOR MEDIUM GRAINED ADAPTIVE ROUTING IN HIGH-PERFORMANCE NETWORK FABRICS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

PCT Information