A computer system may have a number of computers and computer-related components that are electrically connected together. One type of computer system is a rack-based system in which modular computer units (or “chassis units”) are mounted to a frame, or rack. A given chassis unit may have a set of processing nodes; networking and storage components; and the chassis unit may also have a chassis management controller, which allows the components to be remotely monitored and managed.
A rack-based computer system that may have modular units (called “chassis units” herein) that are mounted to a frame, or rack. The chassis units may be mounted to frame members of the rack and may be located in an interior space that is defined by the frame members. A given chassis unit may contain any of a number of different computer components and serve any of a number of different purposes. As examples, a given chassis unit be a compute server; an application server; a storage server; an edge processing server; a blade enclosure; an enclosure containing tray-mounted server nodes; and so forth.
The chassis unit may contain a chassis management controller, which is connected to a network to allow remote monitoring and management of components of the chassis unit. As used herein, a “chassis management controller” is a specialized service processor that, in general, manages and reports on the operation of a computer environment (such as the chassis unit), which may have multiple servers, storage devices, and networking devices. The chassis management controller may operate independently of the operating system of the chassis unit. The chassis management controller may be located on the motherboard or main circuit board of the server or other device to be monitored. The fact that a chassis management controller is mounted on a motherboard of the managed server/hardware or otherwise connected or attached to the managed server/hardware does not prevent the chassis management controller from being considered “separate” from the server/hardware. As used herein, a chassis management controller has management capabilities for sub-systems of a computing device, and is separate from a processing resource that executes an operating system of a computing device. The chassis management controller is separate from a processor, such as a central processing unit, which executes a high level operating system or hypervisor on a system.
The chassis units may have associated relative locations within the rack, with the locations corresponding to chassis location identifications (IDs), or “chassis IDs.” A given chassis unit may learn its chassis ID through a chassis unit location indicator of the unit. The chassis location unit indicator may be an electromechanical device (e.g., a dial indicator) with a settable position-based indicator (e.g., a dial rotated to point to a particular number) to identify the chassis ID; and the device outputs an electrical indication (e.g., a digital value) that represents the chassis ID. For example, for a vertically-oriented rack in which the chassis units are arranged in a vertical stack, the lowermost chassis unit may be “chassis unit 1” having a chassis ID of “1” (as set by a dial on the unit pointing to “1”, for example); the chassis unit directly above chassis unit 1 may be “chassis unit 2” having a chassis ID of “2”; and so forth.
The chassis units may also have rack indicators (e.g., dial indicators), which are set to indicator a rack ID for the units. All chassis units of the same rack have the same rack ID; and this information may be used to associate chassis units with their racks in a network that contains multiple rack-based computer systems.
In addition to the chassis units and rack, a rack-based computer system may contain one or multiple support units (called “rack support units” herein), which may be disposed on the outside of the rack (i.e., not disposed in the interior space where the chassis units are mounted) and provide one or multiple support functions (e.g., provide power distribution and/or thermal cooling) for the chassis units. For example, a rack-based computer system may have one or multiple power distribution units (PDUs). A PDU may, for example, contain circuitry to receive three phase A/C power and convert the three phase A/C power to single phase AC power to the chassis units, which convert the AC power into DC power (via AC-to-DC power supplies or high power DC rectifiers) and distribute the DC power to the chassis unit backplane. In accordance with further example implementations, a PDU may convert AC power into DC power, condition DC power, distributed DC power, and so forth. As a specific example of a rack support unit, a given rack-based computer system may contain a pair of PDUs, where each PDU is mounted on a different side of the rack and extends in the direction along which the chassis units are mounted. The PDU may be connected by a network cable (e.g., an Ethernet cable) to the network port (e.g., an Ethernet port) of a given chassis management controller of the system for purposes of allowing remote monitoring and management of the PDU. In this manner, the network connection allows remote sensing of power demands, controlling the enablement of power conditioning circuitry per chassis unit, controlling whether a chassis unit is powered up or down, and so forth.
As another example of a rack support unit, a rack-based computer system may have a cooling distribution unit (CDU), which is a heat exchanger that removes waste heat from the computer system so that the components of the system remain within an acceptable thermal operating range. As an example, a CDU may be a liquid-based cooling system that includes a secondary cooling loop to circulate and receive a liquid coolant that is circulated near heat dissipating components of the chassis units and the heat sinks mounted to these components. The CDU may contain one or multiple heat exchangers to remove thermal energy from the liquid coolant of the secondary cooling loop and transfer this energy to a primary cooling loop. As another example, the CDU may be an air-to-coolant heat exchanging unit. Regardless of whether the CDU is a liquid-based cooling system or an air and liquid-based cooling system, the CDU may be connected by a network cable (e.g., an Ethernet cable) to a network port (e.g., an Ethernet port) of a given chassis management controller for purposes of allowing remote monitoring and management of the CDU (e.g., for purposes of sensing thermal conditions, controlling the circulating rate of the coolant, controlling fan speed, and so forth).
The rack-based computer system may be managed by a server, called a “rack manager” herein, which communicates with the chassis management controllers of the chassis units. In general, a chassis management controller allows the rack manager to gather information about and control an associated chassis unit of the rack-based computer system. The rack manager may manage multiple rack-based computer systems.
A given rack contains multiple chassis management controllers (one for each chassis unit). In accordance with example implementations, one of the chassis management controllers (called the “primary chassis management controller” herein) may be designated to be the master, or primary, chassis management controller for the rack; and the other chassis management controllers are designated as “secondary chassis management controllers.”
The primary chassis management controller may perform the function of discovering the presence of and information about the rack support units of the rack-based computer system, such as detecting the presence of CDUs and/or PDUs and retrieving information about these units. In this manner, the primary chassis management controller may, via cabling connections (described further herein) detect the presence of an installed rack support unit, communicate with the rack support unit to acquire data representing information about the rack support unit (called “field replaceable unit data” or “FRU data” herein) and communicate data to the rack manager representing the information about the rack support unit. As an example, the FRU data may be data representing an address (e.g., a media access control (MAC) address) of the support unit as well as possible other information about the support unit, such as a part number, hardware version number, firmware version number, serial number, and so forth.
A given rack-based computer system may be rather complex, in that the system may be formed from many different possible combinations of chassis units and support units. Moreover, there may be a considerable number of cable-based connections between these components, such as network cabling (e.g., Ethernet cabling), serial port cabling (e.g., RS232 cabling) and multi-pin cabling. The “multi-pin cabling” refers to cabling that has multiple conductors communicate signals, such as, for example, network signals, serial signals, presence signals, general purpose input/output (GPIO) signals, and so forth. For example implementations that are described herein, display port (DP) cables are used for the multi-pin cabling, although other types of cable may be used for the multi-pin cabling, in accordance with further implementations.
As described further herein, the cable connections may be subject to predefined “best practices” connection rules. As examples, the rules may specify that certain a rack support unit is to be connected (via a network cable, such as an Ethernet cable) to the network port of a chassis management controller having a specific chassis ID; assign serial port connectors of the primary chassis management controller to specific corresponding rack support units; assign display port connectors of the primary chassis management controller as hub port connectors for a star network, which connect to specific corresponding second chassis management controllers; designate a specific secondary chassis management controller display port as being a spoke connector of the start network; and so forth.
Connecting the cabling among the components of the rack-based computer system is prone to human error. Moreover, other aspects of setting up a rack-based computer system may be subject to human error. As examples, a chassis ID dial indicator may incorrectly be set such that there are two chassis units that are assigned chassis ID 2; the lowermost chassis unit on a vertical rack may not be assigned chassis ID 1 by its dial indicator; the rack indicator dial on a chassis unit may incorrectly associate the chassis unit with the wrong rack ID; and so forth.
In accordance with example implementations, a rack manager, performs an automatic discovery process to learn information about a rack-based computer system, and from this information, the rack manager may automatically identify problems with the setup of the rack-based computer system, such as problems with network cabling and/or configuration problems. The information learned through the automatic discovery may include, for example, information about the chassis units and chassis management controllers installed in the rack; the identification of the primary chassis management controller; the locations of the chassis management controllers in the rack; chassis IDs of the chassis management controllers; internet protocol (IP) addresses of the chassis management controllers; the presence of support units, such as PDUs and a CDU; the IP addresses of the support units; FRU information about the support units; and so forth.
In accordance with example implementations that are described herein, through the automatic discovery process, the rack manager learns information about network connections between the support units and the chassis management controllers. For example, the connection rules for the rack-based computer system may specific that the network port of a certain PDU A is supposed to be connected to the network port of a network switch that is disposed in the secondary chassis management controller having chassis ID 2. Through the automatic discovery process, the rack manager learns network connection mappings, or associations, being the rack support units and the chassis management controllers so that, using the results of the network connection associations, the rack manager may determine whether, for example, the network port of PDU A is connected to the network switch of the secondary chassis management controller having chassis ID 2.
More specifically, in accordance with example implementations, each chassis management controller includes a layer two network switch, and the network switch has different network ports. One of these network ports may be connected to a top-of-the-rack (ToR) network switch for the rack, and the other network ports may be potentially connected, via network cabling (e.g., Ethernet cabling), to other components, such as a rack support unit. The network switch populates a level two table, called a “forwarding table,” with a mapping between component identifiers (e.g., hardware addresses, such as MAC addresses) and the ports of the switch that are connected to components having these identifiers.
The network switch may learn this information and populate this table based on, for example, communications with the components that are connected to the ports. For example, a PDU may be connected by network cabling to a port of a given network switch of a given secondary chassis management controller. The PDU may issue a network frame (e.g., a frame corresponding to a Dynamic Host Control Protocol (DHCP) request or a frame corresponding to an Address Resolution Protocol (ARP) request) that contains a source address, which is the MAC address of the PDU. From this frame, the network learns, for example, that a component having the MAC address is connected to the port and updates the forwarding table accordingly.
In accordance with example implementations, the remote rack manager, as part of the automatic discovery process, requests the forwarding tables of the network switches of the chassis management controllers. A forwarding table for a given chassis management controller reveals the MAC addresses of the devices that are connected to the network switch of the chassis management controller and thus, reveals the support unit(s) that are connected to the chassis management controller by network cabling. In accordance with example implementations, the rack manager learns the chassis IDs and MAC addresses of all of the chassis management controllers from the controllers' DHCP requests; and the rack manager learns the MAC addresses of the support units from communications with the primary chassis management controller. Therefore, from the forwarding tables and the MAC addresses of the rack support units, the rack manager may associate specific chassis management controllers to specific rack support units; and the rack manager may use these associations to identify network cabling and/or configuration problems.
In accordance with further example implementations, the rack manager may discover the chassis management controller network to rack support unit associations using forwarding table information derived from one or multiple other network switches (i.e., from switch(es) other than the network switches of the chassis management controllers). For example, in accordance with further example implementations, the rack manager may request the forwarding table from the ToR network switch for the rack-based computer system. The ToR network switch is a level two network switch, which has ports that are connected to the network ports of the chassis management controllers by network cabling. The rack manager uses its knowledge of the MAC addresses of the chassis management controllers and rack support units, in conjunction with the forwarding table mapping from the ToR network switch, to determine associations between the chassis management controllers and the rack support units. For example, from the forwarding table provided by the ToR network switch, the rack manager may learn, for example, that a given port of the ToR network switch is mapped to the MAC address of a given support unit and also mapped to the MAC address of a given chassis management controller.
Referring to
In general, a given rack-based computer system 110 may include multiple cartridges, or chassis units 118, where each chassis unit 118 may be mounted to a frame (i.e., a chassis), housed in an enclosure and include computer-related components, such as one or multiple processing nodes 122, a network switch 124 and an associated chassis management controller 126. In addition to the chassis units 118, the rack-based computer system 110 may include other components, such as one or multiple rack support units (e.g., one or multiple PDUs 130, and/or a CDU 134), a ToR network switch 119, and so forth.
As part of an automatic discovery process, the rack manager 170 may discover a particular chassis management controller 126 in response to the chassis management controller 126 providing a dynamic host control protocol (DHCP) request for an IP address. In this manner, the DHCP request is a broadcast domain request, which is not only seen by a DHCP server 160 that is coupled to the network fabric 150 but may also be observed by the rack manager 170. In accordance with example implementations, the DHCP request contains information identifying the sender of the DHCP request; and as such, the DHCP request from the chassis management controller 126 may contain a MAC address, rack ID identifier and a chassis ID of the controller 126. Therefore, by extracting this information from the DHCP request, the rack manager 170 may be able to associate a given chassis management controller 126 with a rack-based computer system 110 and a particular chassis location within that system 110; and the rack manager 170 may identify the primary chassis management controller 126 (e.g., the controller 126 having a chassis ID 1).
In accordance with example implementations the rack ID, chassis ID and MAC address of the sending device are communicated via the DHCP request using DHCP vendor specific option 0x43, which defines 64 octets beginning at octet offset 278. Sixty-four octets are allowed beginning at octet offset 278 for options, and the format is: code, length, data item. The MAC address of the sending device is at offsets 6 and 70. The sixty-four octets beginning at octet offset 278 may has the following packet format, in accordance with example implementations. The first four octets are a magic cookie; the next octet is vendor specific information code; the next octet is the length; the next two octets are code 2 and length; the next three octets identify the manufacturer; the next two octets are code 3 and length; the next two octets are the version (00 to FF); the next two octets are code 4 and length; the next eight octets are the rack and chassis IDs (in the format of “RnnnnCnn,” where “Rnnnn” represents a rack ID number between 0000 and 9999, and “Cnn” represents a chassis ID number between 00 and 99); and the last 39 octets contain padding data.
As described further therein, the PDUs 130 and CDUs 134 may also issue DHCP requests for purposes of acquiring IP addresses. These DHCP requests pass through the network switches (such as network switches 119 and 124) of the rack-based computer system 110 containing the PDUs 130 and CDUs 134, and accordingly, the network switches of the rack-based computer system 110 update their forwarding tables to map network switch ports to the MAC addresses of these components. As described herein, the rack manager 170 acquires information about the rack support units of a given rack-based computer system 110 from the primary chassis management controllers 126, including the MAC addresses of the support units. Using these MAC addresses, the rack manager 170 may acquire forwarding table information from the network switches of the rack-based computer system 110 for purposes of associating the chassis management controllers 126 with the support units that are connected by network cabling to the controllers 126.
As depicted in
In accordance with example implementations, the memory 178 is a non-transitory memory that may be formed from such memory devices as magnetic storage devices, semiconductor devices, memristors, phase change memory devices, a combination of one or more of the foregoing memory devices, and so forth.
The chassis IDs may be set by electromechanical indicators, for example (e.g., dial indicators) located on the chassis management controllers 126. As an example, chassis management controller 126 of the chassis unit 118-1 corresponds to chassis ID 1; the chassis management controller 126 of the chassis unit 118-2 corresponds to chassis ID 2; and so forth. In accordance with example implementations, the chassis management controller 126 having chassis ID 1 is the primary chassis management controller 126, with the other chassis management controllers 126 being the secondary chassis management controllers for the rack-based computer system 110.
In accordance with example implementations, the chassis management controller 126 has a number of ports, such as example port connectors 202, 204, 208, 210, 212, 216, 220, 224 and 228 that are depicted in
Moreover, the cable connection rules may specify how cabling connections are to be made between rack support units and the chassis management controllers 126. For the example implementation that is depicted in
When a particular PDU 130 is installed in the rack-based computer system 110, the serial cable 272 is connected to the appropriate serial port of the primary chassis management controller 126 and allows the PDU 130 to assert a presence signal on the serial cable 272 to alert the primary chassis management controller 126-1 to the presence of the installed PDU 130. The primary chassis management controller 126 may then communicate over the serial cable 272 to retrieve FRU data from the PDU 130, including data representing the MAC address of the PDU 130 and other information about the PDU 130. The specific serial port connector to which the PDU 130 is connector allows the primary chassis management controller 126 to designate the PDU 130 as either PDU A or PDU B. In accordance with example implementations, the primary chassis management controller 126 may be connected by serial cabling to the CDU 134, such that the controller 126 may detect the presence of the CDU 134 and retrieve FRU from the CDU 134.
Cabling connection rules may specify how network cabling 270 (e.g., Ethernet cabling) is to be connected among the rack support units and the chassis management controllers 126. As depicted in
As depicted in
Referring to
As depicted in
The chassis management controllers 126 issue DHCP requests, as depicted at 320, for purposes of acquiring IP addresses. These DHCP requests also include the rack and chassis IDs of the chassis management controllers 146 issuing the request. Therefore, from the DHCP requests from the chassis management controllers 126, the rack manager 170 may then associate the chassis management controllers 126 with particular racks and learn the location of the chassis management controllers 126 within the rack, as depicted at 325.
As depicted at 328, the PDU 130 also issues a DHCP request for purposes of obtaining an IP address. This DHCP request from the PDU 130 traverses a network switch 124 of a particular chassis management controller 126; and accordingly, this network switch 124 learns the MAC address of the PDU 130 (as depicted at 329) and populates its forwarding table with the MAC address-to-port association. The rack manager 170, if also serving as the DHCP server, may then respond to the DHCP request from the PDU 130 and provide the IP address to the PDU 130, as depicted at 332.
As part of the discovery process, the rack manager 170 requests (as depicted at 336) the PDU FRU data from the primary chassis management controller 126-1, and the primary chassis management controller 126-1 returns the PDU FRU data to the rack manager 170, as depicted at 340.
As depicted at reference numeral 334, in accordance with example implementations, the rack manager 170 may request (via an application programming interface (API) call, for example) the forwarding tables from the network switches 124, and accordingly, the switches 124 return data representing the forwarding tables, as depicted at reference numeral 348. The rack manager 170 may then match the MAC addresses from the discovered rack support units, such as the PDU 130, to the MAC addresses of the forwarding tables for purposes of learning the PDU and CDU locations based on the network connections to their respective chassis management controllers 126. As depicted at reference numeral 354, the rack manager 170 may then create DHCP reservations.
The PDU 130 also issues a DHCP request, as depicted at 428, for purposes of obtaining an IP address, and this DHCP request traverses a port of the ToR switch 119. From this information, the ToR switch 119 may then learn the MAC address of the PDU 130, as depicted at 429. Moreover, as depicted at 430, from the DHCP request, the rack manager 170 learns the PDU location and the MAC address of the PDU. Moreover, as depicted at 434, the rack manger 170 responds to the PDU 130 with an IP address.
As depicted at 440, the rack manager 170 may then request PDU FRU data from the primary chassis management controller and receive, as depicted at 444, the PDU FRU data from the primary chassis management controller 126. The rack manager 170 may then request (as depicted at 446) the forwarding table from the ToR switch 119, and the ToR switch 119 may then return the forwarding table, as depicted at 452. The forwarding table, in turn, contains a port-to-MAC address mapping, showing the relationships between these chassis management controller MAC addresses and ports, and also showing the relationship between the PDU MAC address and particular port of the ToR switch 119. Therefore, the rack manager 170 may, through these mappings, associate the PDU 130 with a particular chassis management controller 126, as depicted at 454.
Although implementations are described herein in which MAC addresses are learned from DHCP requests, in accordance with further example implementations, the PDUs, CDUs and chassis management controllers may request IP addresses using DHCP version 6 (DHCPv6) requests, which contain unique identifiers, instead of MAC addresses, for the requesting clients. For these further example implementations, the forwarding tables contain port to unique identifier mappings, and the rack manager may use this information to associate rack support units with chassis management controllers. Both MAC addresses and DHCPv6 unique identifiers are considered herein to be “identifiers” for components derived from DHCP requests, such as, for example, identifiers for the rack support units.
Referring to
Referring to
Referring to
Referring to
In accordance with an example implementation, a DHCP request is received from the support unit, where the DHCP request is communicated through a port of a plurality of ports of a network switch corresponding to the given secondary chassis management controller. A particular advantage of the DHCP request from the support unit is that the network switch maps and association between an identifier for the support unit and the port of the network switch.
In accordance with an example implementation, a plurality of DHCP requests for IP addresses are received from a plurality of chassis management controllers, where each chassis management controller of the plurality is associated with a chassis of a plurality of chassis mounted to the rack; and each DHCP request is associated with a different chassis management controller and contains data representing a chassis location identifier for the associated chassis management controller. A particular advantage of this arrangement is that the chassis management controllers may be associated with a location in the rack.
In accordance with example implementations, each DHCP request may include data representing a rack identifier. A particular advantage of this arrangement is that the chassis management controllers may be associated with a particular rack.
In accordance with example implementations, the support unit is associated with a chassis location identifier associated with the given secondary chassis management controller. A particular advantage of this association is that configuration and/or cabling errors may be detected.
In accordance with an example implementation, the forwarding table includes a port-to-identifier mapping, and identifying the given secondary chassis management controller includes using the identifier for the support unit as an index to the mapping to identify a port of the network switch corresponding to the given secondary chassis management controller. A particular advantage of this association is that configuration and/or cabling errors may be detected.
In accordance with example implementations, accessing the network switch forwarding table data includes requesting forwarding table data stored in a plurality of network switches that are located in the chassis management controllers. A particular advantage of this association is that configuration and/or cabling errors may be detected.
In accordance with example implementations, accessing the network switch forwarding table data includes requesting forwarding table data that is stored in a top-of-the-rack (ToR) network switch that has a separate network cable in connection to each chassis management controller of the chassis management controllers. A particular advantage of this arrangement is that it may be easier to request the forwarding table data from a single network switch.
While the present disclosure has been described with respect to a limited number of implementations, those skilled in the art, having the benefit of this disclosure, will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations.