A network can include various electronic devices that are connected to each other, such as through one or multiple switches. Data communication with or among the electronic devices is accomplished through the switch(es). In some cases, the connection infrastructure between the electronic devices and the switch(es) can include an optical connection infrastructure, which includes optical signal conduits (e.g. optical fibers or optical waveguides).
Some embodiments are described with respect to the following figures:
In a network, different connection topologies can be used to interconnect electronic devices to intermediate devices, such as switches. Electronic devices can communicate with each other through a network that includes the switches or other types of intermediate devices. Examples of electronic devices include client computers, server computers, storage devices, and so forth. A “switch” can refer to any device used for passing data among the electronic devices or between the electronic devices and other devices. A “switch” can also refer to a router or a gateway or any other type of device that allows for interconnection between different devices.
In the ensuing discussion, reference is made to arrangements in which electronic devices are connected to a switch (or multiple switches). It is noted that techniques or mechanisms according to some implementations can also be applied in other contexts in which different devices are interconnected to each other using a connection infrastructure.
A “connection topology” of a connection infrastructure to interconnect electronic devices to a switch can refer to a specific arrangement of signal paths that are used to interconnect the electronic devices with the switch (or switches).
Although some example connection topologies are illustrated in
In some implementations, the connection infrastructure used between the electronic devices and a switch (or multiple switches) is an optical connection infrastructure. The optical connection infrastructure includes optical signal conduits, where an optical signal conduit can include an optical fiber or optical waveguide and associated components, such as reflectors, splitters, and so forth.
An optical signal conduit is part of an optical link, which includes the optical signal conduit in addition to other components, such as optical connectors (e.g. blind-mate optical connectors) and electrical-optical converters (to convert between electrical signals and optical signals). For example, as shown in
As further shown in
In the example of
Depending on operations or applications to be provided in a network, one connection topology may be more efficient than another connection topology (such as in terms of connectivity cost versus connection bandwidths). However, it can be difficult to change the connection topology of an optical connection infrastructure (such as the optical connection infrastructure 201 of
In addition to changing connection topologies of optical connection infrastructures for different operations or applications in a network, it may also be desirable to change connection topologies to accommodate new designs of electronic devices or switches. It may also be desirable to modify a connection topology in response to a changing networking standard, or in response to a changing environment of an enterprise (e.g. business concern, government agency, business organization, individual, etc.).
In accordance with some implementations, dynamic reconfiguration of an optical connection infrastructure can be performed without replacing or modifying any physical components of the optical connection infrastructure. In some implementations, the dynamic reconfiguration is performed by programmatic reconfiguration (between different settings) of network interface components (such as the NIC 212 of
Note that in the example in
Each port of the internal switch interface 214 is a four-lane port in the depicted example. Each four-lane port of the internal switch interface 214 is connected to a four-lane path 308, which is connected to four electronic devices 102. Thus, each four-lane port of the internal switch interface 214 is connected to a respective group of four electronic devices 102. Each four-lane path 308 is connected to the NICs 212 of the electronic devices 102. Note that each NIC 212 has a four-lane port to communicate with the four-lane path 308. Also, the electrical-optical converter 202 (shown in
Multiple groups 310 and 312 of electronic devices 102 are shown in
The various paths between the switch 104 and the electronic devices 102 are part of the optical connection infrastructure 201 of
In some examples, each one of the multiple groups 310 and 312 can be reconfigured to change the network topology of the optical connection infrastructure 201. In other examples, less than all of the multiple groups 310 and 312 can be reconfigured to change the network topology.
The flexibility in reconfiguring the network topology of the optical connection infrastructure 201 allows an enterprise to balance performance, power, and cost in connecting electronic devices to one or multiple switches. Also, mechanisms according to some implementations for connecting electronic devices to a switch allow for a reduction in the number of ports that have to be provided on the switch.
In the example of
With the arrangement of
For the
Each NIC 212 can be reconfigured by reprogramming a predefined portion of the NIC. For example, the NIC 212 can include a configuration register that when programmed with different values causes different combinations of lanes of the four-lane port to be enabled and disabled. Alternatively, the NIC 212 can include one or multiple input control pins that can be driven to different values to control the enabling/disabling of the lanes of the four-lane port.
Reconfiguring the NICs of the electronic devices in the group 310 to change the network topology between the star topology (
The dynamic reconfiguration of the NICs 212 to provide the different connection topologies can be controlled by a controller 320. The controller 320 can be part of the switch 104, or alternatively, the controller 320 can be a system controller (e.g. rack controller) that is able to communicate with the switch 104 to cause the switch 104 to reprogram the electronic devices 102.
The controller 320 can include control logic 322, which can be implemented as machine-readable instructions executable on one or multiple processors 324. The processor(s) 324 can be connected to a storage medium (or storage media) 326. The control logic 322 is executable to perform various tasks, including the control of dynamic reconfiguration of a network topology of an optical connection infrastructure.
Each lane discussed in connection with FIGS. 3 and 4A-4B can be a transmit lane or a receive lane, or both. In some examples, both transmit lanes and receive lanes are configured either as dedicated lanes or shared lanes. This provides a pseudo-symmetric bandwidth between the transmit and receive lanes, where the bandwidth in the transmit direction and receive direction are generally the same.
The control logic 322 can dynamically reconfigure the NICs lanes to be shared or dedicated. Also, dedicated NIC lanes can be reconfigured to have different dedicated lanes to handle a faulty lane condition. For example, if a dedicated lane for a NIC's transmitter becomes non-functional, then another lane can be reconfigured to be dedicated, which enables higher fault resiliency for the NIC transmit lanes. To illustrate this example, assume that NIC1's transmitter is dedicated to lane 0 and NIC2's transmitter is dedicated to lane 1. When NIC1 detects that its transmit lane is non-operational, it notifies the controller 320 and the controller 320 commands NIC2 to stop its transmission on its transmit lane 1 after the current operation. After NIC2 and the switch 104 acknowledge to the controller 320 that they have disabled use of lane 1 for communications by NIC2's transmitter, the controller 320 commands NIC1 to use lane 1 to transmit and the switch to use lane 1 to receive communication from NIC1. In addition, the controller 320 can command NIC2 to use its lane 0 to transmit and the switch to receive NIC2's communication on lane 0.
In alternative examples, the connection topology for transmit lanes and receive lanes of the switch 104 can be different. For example, the receive lanes (to communicate data sent from the electronic devices 102 to the switch 104) can be configured as dedicated lanes, while the transmit lanes (to communicate data sent from the switch 104 to the electronic devices 102) are configured as shared lanes. Such an arrangement provides asymmetric bandwidth, where greater bandwidth is available on the NIC's 212 receive lanes and less bandwidth on its transmit lanes. Asymmetric bandwidth on the transmit and receive lanes can be useful for certain applications, such as applications involving video codec translation from HDTV formats to mobile phone screen format video streams, where a relatively large bandwidth is received and processed, but less data is communicated on the transmit lanes since the transmit lanes are used to communicate data requests. If the NIC transmit lanes are dedicated (i.e. not shared), then arbitration among the NICs may not have to be used as the switch can have built-in capabilities to handle the simultaneous transactions of dedicated transmit lanes, regardless of whether the receive lanes are shared or not. For either the topology of
An optical splitter can perform splitting and combining functions on optical signals. The optical splitters can be based on the use of optical waveguides and micro-mirrors, or other like technology. An optical signal sent over a transmit (T) lane from an NIC 212 is propagated by a respective optical splitter 504 towards the switch interface port.
In the reverse direction, an optical splitter 508 directs an optical signal from the switch interface port towards the receive (R) lane of the corresponding NIC 212.
In some examples, the groups 502 and 506 of optical propagation devices can be part of a single physical component. In different examples, the groups 502 and 506 of optical propagation devices can be part of two different physical components, where one physical component includes the group 502 of optical propagation devices, and another physical component includes the group 506 of optical propagation devices.
According to other implementations,
The other four taps of the five-tap bus device 520 are connected over respective M-fiber optical links 526, 528, 530, and 532 to respective 1×M ferrules 534, 536, 538, and 540 to corresponding NICs 212.
In
Signals transmitted by the driver 606 are received by a receiver 608 in the NIC 212 of an electronic device 102. An oval 636 represents an electrical-optical converters to convert received optical signals received into electrical signals to provide to the receiver 608.
In the example of
The recovered clock frequency is provided from the CDR circuit 612 to a clock phase adjustment block 614 and the de-serializer 610 in the NIC 212. The clock phase adjustment block 614 in turn produces a phase adjusted output clock that is used to drive a serializer 616 and a driver 618 in the NIC 212. The driver 618 transmits a data stream to the switch interface 214. An oval 630 represents electrical-optical converter of the NIC 212.
A data stream is received by receiver 620 in the switch interface 214 (oval 632 represents an electrical-optical converter of the switch interface 214). The output data stream from the receiver 620 is provided to a de-serializer 622 and the CDR circuit 624 in the switch interface 214. Additionally, note that there is a receiver 620 and CDR circuit 624 for each lane.
In some examples, to minimize (or reduce) clock signal lock and clock recovery times, a clock phase delta is calculated by the clock phase delta computation block 626 in the switch interlace 214. The clock phase delta can refer to the difference in phase between the clock signal of the local dock source 602 in the switch interface 214 and the recovered clock in the NIC 212. In specific examples, calculation of the clock phase delta can be performed during each NIC's PMD (physical medium dependent) training period in a Multi-point MAC Control Protocol (MPCP) layer (as described in IEEE 802.3ah).
The clock phase delta is sent to the NIC's clock phase adjustment block 614 via the NIC's MPCP layer. Each NIC's transmit clock phase is adjusted by its phase adjust block 614 until the received signal at the switch interface receiver 620 is synchronized with the local source clock 602. The clock phase delta is recalculated repeatedly by the clock phase delta computation block 626 and sent (if adjustment is to be performed at the NIC 212) to the NIC's phase adjustment block 614. The clock phase delta can be sent in either existing messaging or new messaging, such as a protocol data unit (PDU) of the MPCP layer.
Although
If multiple lanes of a multi-lane port in the NICs 212 of the electronic devices 102 are enabled (such as according to the
In accordance with some implementations, an arbitration mechanism can be provided to control NICs sharing the switch interface port such that just one NIC is granted access to transmit at a time. The arbitration mechanism can be implemented in the switch interface 214 and in each of the NICs 212.
A switch interface port (e.g. switch interface port 0 in
The switch interface port next, sends (at 704) a CTS (Clear to Send) frame to a selected NIC (e.g. NIC1). As noted in
In response to the CTS message, the selected NIC (e.g. NIC1) transmits (at 706) data to the switch interface port. The transmitted data can be in one or multiple MTS (More to Send) frames, where each MTS frame can include a data payload to carry data. The transmission of the MTS frame(s) is during the CTS window (indicated by the CTS window size in the CTS frame). In response to each MTS frame transmitted by the selected NIC, the switch interface port unicasts (at 708) an acknowledgement (ACK) of the MTS frame.
The selected NIC (e.g. NIC1) next sends (at 710) sends an ETS (End to Send) frame to indicate end of transmission by the selected NIC. At least one information element in the ETS frame can be set as follows: (1) the information element can be set to a first value to indicate that the transmit buffer of the selected NIC becomes empty (due to data in the transmit buffer having been transmitted) before the CTS window size is used, or (2) the information element can be set to a second value to indicate that the CTS window size was used up before the transmit buffer of the selected NIC becomes empty.
In response to the ETS frame, the switch interface port unicasts (at 712) an STS frame to the selected NIC (e.g. NIC1).
NIC1 then sends (at 714) an ACK of the STS frame (712), and turns off its transmitter. The switch interface 214 can then select the next NIC (e.g. NIC2) to perform transmission on the shared bus. The selection of the next NIC can use a round-robin arbitration scheme or other type of arbitration scheme.
The switch interface port then unicasts (at 716) a CTS frame to NIC2, with the CTS frame containing a CTS size. Tasks 718, 720, and 722 are similar to tasks 706, 708, and 710, respectively, as discussed above.
Upon receiving the ETS frame at 722, the switch interface 214 may detect that NIC2 still has more data to transmit in its transmit buffer, but had to stop transmitting due to expiration of the CTS window. In this case, the switch interface 214 can re-grant the shared bus to NIC2 again, by unicasting (at 724) a CTS frame to NIC2. Tasks 726, 728, 730, 734, and 736 are similar to tasks 706, 708, 710, 712, and 714, respectively, as discussed above.
The process of
When multiple NICs are sharing a bus to a switch interface port, it may be possible that a NIC's receive buffer (to buffer data transmitted from the switch interface port to the NICs sharing the bus) can be overrun, which refers to the receive buffer filling up and unable to buffer any further data transmitted by the switch interface port. During a time window assigned to another NIC during which a particular NIC is unable to transmit over the shared bus, the particular NIC would not be able to provide an overrun indication to the switch interface port (to cause the switch interface port to pause transmission of data).
To address the foregoing issue, various mechanisms can be implemented. For example, the receive buffer of each NIC can be increased in size to allow the receive buffer to sink traffic at the traffic communication rate from the switch interface port during time windows assigned to other NICs.
Alternatively, a mechanism can be provided to allow transmission from the switch interface port to a NIC only during the NIC's assigned time window so that the NIC can respond with an overrun indication if the NIC's receive buffer reaches a predefined depth.
As yet another example, it is assumed that a NIC has multiple receive queues that are associated with respective priorities. In other words, a first of the receive queues is used to buffer data associated with a first priority, a second of the receive queues is used to buffer data associated with a second priority, and so forth. During initialization of the NIC, the NIC can send Q-Size[p] for each of its receive queues (where p can have different values to represent respective priorities). The parameter Q-Size[p] indicates the size of the corresponding receive queue (for receiving traffic of priority p). Also, the NIC sends Q-Depth[p] for each of its receive queues at the end of its assigned time window (during which the NIC is able to transmit over the shared bus). The parameter Q-Depth[p] represents the depth of the receive queue for priority p. The switch interface can maintain Q-Size[n,p] and Q-Depth[n,p] for each NIC (where n represents the corresponding NIC) and priority (p). During a time window not assigned to NIC n, data sent from the switch interface port is controlled to be capped at Q-Avail[n,p]=Q-Size[n,p]−Q-Depth[n,p].
In further examples, a NIC can also send a parameter Q-AvgDrainRate[p], which represents a weighted running average of how fast the NIC is able to absorb or sink traffic for each corresponding priority p. The parameter Q-AvgDrainRate[p] can be used by the switch interface to calculate a dynamic parameter Q-Avail[n,p](t), given the NIC's last known Q-Depth[n,p] and the amount of data transmitted from the corresponding switch interface's egress queue [n,p]. The dynamic parameter Q-Avail[n,p](t) can be used to calculate Q-Avail[n,p] for the muted NICs to control the amount of data to transmit from the switch interface port.
Note that certain NICs support a shared receive memory pool, which can be used to expand the size of a receive buffer for multiple traffic priorities. Information relating to the size of this shared receive memory pool can also be communicated to the switch interface for use in determining how much data can be sent by the switch interface port to the NIC.
Alternatively, some combination of the foregoing techniques can be used.
Machine-readable instructions of modules described above (including the control logic 322 or switch logic 302 of
Data and instructions are stored in respective storage devices, which are implemented as one or more computer-readable or machine-readable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; or other types of storage devices. Note that the instructions discussed above can be provided on one computer-readable or machine-readable storage medium, or alternatively, can he provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes. Such computer-readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. The storage medium or media can be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions can be downloaded over a network for execution.
In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US12/33179 | 4/12/2012 | WO | 00 | 6/27/2014 |