As data volume and processing requirements increase, computing clusters grow in size and, hence, inter-server communication requirements correspondingly increase. Traditional Ethernet and other network-based fabrics, such as InfiniBand, have full-fledged management support. However, the performance, cost, and power overheads can be significant. On the other hand, recent work on using a motherboard-level I/O-interconnect, such as PCIe, as a high-speed network fabric shows promising performance and energy efficiency results, but also reveals challenges in managing such high-speed I/O-interconnect based networks. For example, a key challenge is for such I/O-interconnects to support advanced management features such as fault tolerance, end-to-end flow control, and quality-of-service (QoS). These features are supported by traditional networks such as Ethernet, but not by I/O-interconnects such as PCIe mainly because such features are often expensive for I/O-interconnects.
Certain examples are described in the following detailed description and in reference to the drawings, in which:
Techniques described herein relate generally to the management of a high-speed network that lacks management features via another lower-speed network that includes such management features. More specifically, techniques described herein relate to the combination of a high-speed I/O-interconnect, such as a Peripheral Component Interconnect Express (PCIe) network, with a separate out-of-band network, such as an Ethernet network. Such techniques may be used to simultaneously provide high-speed, energy-efficient data transfer and rich management features without significantly increasing the cost and complexity of the networking system. In various examples, the separate out-of-band network may be a low-bandwidth, low-cost network that provides reliable device discovery, registration, and other management features for a high-bandwidth I/O-interconnect that lacks such management features.
Each of the servers 106 may also include a storage device 112 that is configured to store data. Such data may include data that is to be transferred between the servers 106 via the PCIe network 102, or data is to be transferred between any of the servers 106 and a client 114 via the Ethernet network 104.
The Ethernet network 104 may be used to facilitate communication between the client 114 and the servers 106. For example, data that is stored within a storage device 112 of one of the servers 106 may be sent to, or received from, the client 114 or any of the servers 106. For the sake of clarity, the present disclosure describes the network 104 as an Ethernet network. However, it will be appreciated that other types of networks may also be used in accordance with examples. For example, any type of network that provides management features may be used, such as Ethernet InfiniBand, or Fiber Channel, among others.
In various examples, the client 114 is configured to provide network administrative functions. The client 114 may be any type of computing device, such as a desktop computer, laptop computer, tablet computer, server, or mobile phone, among others. As shown in
In various examples, the PCIe network 102 includes any suitable number of network links 118 and network switches 120 that are configured to communicably couple the servers 106 within the networking system 100. The network switches 120 may be rack-level switches. The network links 118 and network switches 120 of the PCIe network 102 may facilitate communications between the servers 106. For example, data that is stored within the storage device 112 of one of the servers 106 may be sent to, or received from, any of the other servers 106 through the PCIe network 102.
In some examples, the client 114 may request data that is distributed between more than one of the servers 106. The server 106 to which the client 114 makes the request may have access to some of the data. However, the server 106 may also have to gather additional data from any of the other servers 106. Once the additional data has been gathered, all of the requested data can be sent from the server 106 to which the client 114 made the request back to the client 114.
For the sake of clarity, the present disclosure describes the network 102 as a PCIe network. However, it will be appreciated that other types of networks may also be used. For example, the network 102 may be any type of high-speed I/O-interconnect based fabric that lacks management features, such as PCIe, HyperTransport, or other enclosure-level interconnects.
In various examples, the Ethernet network 104 is configured to manage the PCIe network 102 such that the proper operation of the PCIe network 102 is maintained. This may be accomplished via input from a network management agent 122 and/or a monitoring agent 124. As shown in
The network management agent 122 and the monitoring agent 124 may each include hardware, software, or firmware that is configured to control the configuration of the PCIe network 102 via input from the Ethernet network 104. In some examples, the network management agent 122 controls the functioning of the PCIe network 102 via network management requests that include specific actions to be performed on the PCIe network 102. Further, the monitoring agent 124 may be configured to monitor the PCIe network 102 to determine whether the actions specified by the network management requests are implemented within the PCIe network 102.
In addition, each of the servers 106 may include a driver 126 that receives management instructions from the network management agent 122 via the Ethernet network 104, for example, and translates the management instructions into actions to be performed on the PCIe network 102. The driver 126 may be configured to control the transfer of data packets over the PCIe network 102 in accordance with the specified configuration. In various examples, the driver 126 also provides a software interface for the PCIe network 102 that enables the client 114 to access the PCIe network 102 for configuration purposes. The driver 126 may be configured to provide such functionalities in either a kernel mode or a user mode, depending on the details of the specific implementation.
It is to be understood that the block diagram of
The method begins at block 202, at which a PCIe management request is issued via the Ethernet network. The PCIe management request may be issued by the network management agent 122 that resides on the client 114, or on any of the servers 106, as discussed above with respect to
In some examples, the PCIe management request is sent by the network management agent automatically. For example, if a server is disconnected from the PCIe network, the network management agent may send a PCIe management request automatically after the server is rebooted. In other examples, the network management agent may issue a PCIe management request in response to input from a user, such as a network administrator, via a client computing device.
In some examples, the PCIe management request is a request to include a rebooted server in the PCIe network. Such a PCIe management request may provide fault tolerance, as well as server fault isolation, for the PCIe network. For example, if one server fails, and the PCIe network's connection to the server is lost, the Ethernet network may allow for the rediscovery and registration of the server using the PCIe management request. The network management agent may automatically send such a PCIe management request in response to the failure of a particular server.
In other examples, the PCIe management request is a request to change a data flow rate between servers. Such a PCIe management request may be used to provide out-of-band flow control and quality of service (QoS) information from the Ethernet network to the PCIe network. In response to a PCIe management request related to a change in data flow rate, the driver may adjust the data flow rate between servers using any suitable means. For example, the driver may adjust the data buffer size or data packet priorities for different flows between the servers based on the PCIe management request.
Further, in some examples, the PCIe management request is a device discovery request. The device discovery request may instruct the driver 126 to identify devices connected to the PCIe network, such as switches, servers, and the like. Devices discovered by the driver 126 may be reported back to the network management agent through the Ethernet network.
At block 204, the network management agent forwards the PCIe management request to a PCIe driver. In various examples, the PCIe management request is forwarded to the PCIe driver within any number of specific servers relating to the PCIe management request. In examples, the Ethernet packet used to communicate the PCIe management request may include header information that identifies it as a PCI management request. The header information may also identify one or more target servers for the PCIe management request.
At block 206, the PCIe management request is translated into an action via the PCIe driver. The action relates to a specific PCIe management function. For example, the action may include configuring the PCIe switch topology and bandwidth, or controlling the transfer rate of data packets for specific flows. The action may also include changing the width, voltage, or frequency of PCIe links, or diagnosing related functions, among others. In some examples, the PCIe driver translates the PCIe management request into a number of actions relating to the specific PCIe management function.
At block 208, upon completion of the PCIe management request, the PCIe driver sends an acknowledgement to the source of the PCIe management request through the Ethernet network. At block 210, changes to the PCIe network that were caused by the action relating to the PCIe management request are observed via the monitoring agent. Such changes may include both device-level and user-level changes to the PCIe network.
At block 212, the monitoring agent makes a determination regarding whether the PCIe management request has been satisfied. If it is determined that the PCIe management function has been satisfied, the method 200 is completed at block 214. In some examples, the monitoring agent issues a confirmation message through the Ethernet to indicate that the PCIe management function has been satisfied. However, if it is determined that the PCIe management function has not been satisfied, an additional PCIe management request may be issued via the monitoring agent at block 216. This may be repeated until the PCIe management function that was specified by the original PCIe management request has been satisfied.
The method begins at block 302, at which a network management request is received at the driver through the first network. The network management request may be any type of request that relates to managing the configuration of the second network.
At block 304, the driver translates the network management request into an action to be performed on the second network. The action may be, for example, a change to the size of a data buffer on one of the servers, or a change to the priority of data packets sent between servers. In addition, if the network management request is a request to include a rebooted server in the second network, the action may be a notification to a switch of the second network to include the rebooted server. Further, in some examples, the network management request is translated into multiple actions.
At block 306, the driver issues the action to one or more components of the second network. The components of the first network may be configured to perform the specified action. The action may result in the implementation of a management function corresponding to the network management request. For example, the action may change a transfer rate of data exchanged between servers on the second network, or may instruct a switch of the second network to include a rebooted server on the second network. Further, in some examples, if the network management request includes multiple actions, all of the actions may be used to implement a specific management function.
The various software components discussed herein may be stored on the tangible, non-transitory computer-readable medium, as indicated in
The block diagram of
While the present techniques may be susceptible to various modifications and alternative forms, the examples discussed above have been shown only by way of example. It is to be understood that the technique is not intended to be limited to the particular examples disclosed herein. Indeed, the present techniques include all alternatives, modifications, and equivalents falling within the true spirit and scope of the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2012/034977 | 4/25/2012 | WO | 00 | 7/25/2014 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/162547 | 10/31/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6954454 | Schuster | Oct 2005 | B1 |
7480303 | Ngai | Jan 2009 | B1 |
7646979 | Ciancaglini | Jan 2010 | B1 |
7925802 | Lauterback et al. | Apr 2011 | B2 |
7929565 | Winter | Apr 2011 | B2 |
7945678 | Skene | May 2011 | B1 |
8019910 | Brownell et al. | Sep 2011 | B2 |
8099624 | Saxena | Jan 2012 | B1 |
8509097 | Gourlay | Aug 2013 | B1 |
20020171886 | Wu | Nov 2002 | A1 |
20020188709 | McGraw | Dec 2002 | A1 |
20030131119 | Noonan | Jul 2003 | A1 |
20040223497 | Sanderson | Nov 2004 | A1 |
20050010709 | Davies | Jan 2005 | A1 |
20050102549 | Davies | May 2005 | A1 |
20080313240 | Freking et al. | Dec 2008 | A1 |
20090010159 | Brownell et al. | Jan 2009 | A1 |
20090016348 | Norden et al. | Jan 2009 | A1 |
20090037616 | Brownell et al. | Feb 2009 | A1 |
20090059957 | Bagepalli et al. | Mar 2009 | A1 |
20100115174 | Akyol et al. | May 2010 | A1 |
20100191858 | Thomas | Jul 2010 | A1 |
20110119423 | Kishore et al. | May 2011 | A1 |
20110246692 | Valk et al. | Oct 2011 | A1 |
20130077481 | Philavong | Mar 2013 | A1 |
20150160627 | Maddukuri | Jun 2015 | A1 |
Number | Date | Country |
---|---|---|
101277195 | Oct 2008 | CN |
101535979 | Sep 2009 | CN |
102164044 | Aug 2011 | CN |
Entry |
---|
“PCIe.” Newton's Telecom Dictionary, 26th ed. Aug. 1, 2011. |
Margaret Rouse, “Blade Server” http://searchdatacenter.techtarget.com/definition/blade-server, Feb. 2008. |
PCT Search Report/Written Opinion˜Application No. PCT/US2012/034977 dated Dec. 21, 2012˜9 pages. |
Number | Date | Country | |
---|---|---|---|
20150074250 A1 | Mar 2015 | US |