A computer system typically includes one or more buses to enable communication between devices. The buses include a processor bus or system bus to enable inter-communication among a processor, storage devices, input/output (I/O) devices, peripheral devices, and so forth. In addition, some computer systems include a management bus to enable management-related devices to communicate with each other separately from the processor or system bus.
Typically, management buses, such as an I2C bus, are provided in high-performance computer server systems, storage server systems, or other electronic systems. A management bus enables a management module to perform various types of management tasks, such as monitoring the health of various components of the system, disabling failed components, and so forth. For redundancy, it may be desirable to have multiple management modules that are connected to the management bus. However, a concern associated with connecting multiple management modules to a management bus is that failure of one management module may disable the management bus such that the remaining one or more management modules may not be able to communicate over the management bus. If a failed management module causes failure of the management bus, then the system may not operate properly, particularly when the remaining management module(s) can no longer communicate with certain devices to allow such devices to initialize or reset properly.
The processing modules 132 are able to communicate with each other over the backplane 100. The interconnect structures of the backplane 100 include multiple switch fabrics 116, 118, and 120. Each switch fabric 116, 118, 120 includes one or more switch fabric controllers (not shown). Each switch fabric 116, 118, 120 also includes interconnect circuits (communication lines or buses). Communication over the interconnect circuits are controlled by respective switch fabric controllers.
The electronic system of
According to one embodiment, each of the reset and power management modules 108, 110, 112 performs the following tasks: power supply management; clock subsystem monitoring and reporting; and reset control of the switch fabric controllers. Other or alternative management tasks can be performed by the reset and power management modules 108, 110, 112 in other embodiments. The three reset and power management modules 108, 110, and 112 are redundant modules. If any one or even two of the reset and power management modules should exhibit failure, the electronic system can nevertheless continue to operate due to the presence of the remaining functional one or more reset and power management modules. Although three redundant reset and power management modules are depicted in
Thus, for example, if the reset and power management module 112 should fail, the remaining reset and power management modules 108, 110 can continue to perform management tasks with respect to the switch fabrics 116, 118 and power supplies 102, 104. The remaining reset and power management modules 108, 110 can also continue to perform monitoring of the clock subsystem 114.
The reset and power management modules 108, 110, 112 can be implemented as field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), microcontrollers, microprocessors, and so forth.
A feature of the reset and power management modules 108, 110, and 112 is that they are independent of each other and do not rely upon each other for their tasks. No handshaking or other forms of interaction is performed between the reset and power management modules 108, 110, 112. In the event of a failure of any single reset and power management module, corruption of management and reporting tasks of the other reset and power management modules would not occur to enhance the likelihood of continued operation of the electronic system. In accordance with some embodiments, to avoid failure of one reset and power management module from disabling or otherwise corrupting other reset and power management modules, the reset and power management modules 108, 110, and 112 are connected to a management bus structure 130 that includes a hub 106.
The hub 106 is effectively a bus isolation component to isolate defective reset and power management modules connected to the management bus structure 130. The hub 106 has multiple ports that are connected over a respective bus to a respective reset and power management module. In one embodiment, the bus is a serial bus that includes a data line and a clock line. An example of a bus that can be used for management tasks is the I2C bus. One version of the I2C bus is described by the I2C Bus Specification, Version 2.1, dated January 2000. In other embodiments, other types of buses can be used.
Each port of the hub 106 is enabled by a respective enable signal EN (ENA, ENB, ENC, and END depicted in
Enabling a port of the hub 106 means that the port allows the connected bus device to communicate over the management bus structure 130. Disabling a port of the hub 106 means that the port is isolated from the management bus structure. The reset and power management modules 108, 110, 112 communicate with the interface module 128 over the management bus structure 130. In turn, the interface module 128 communicates information (such as health information relating to the power supplies, switch fabrics, and/or clock subsystem) received from the reset and power management modules to other devices in the electronic system. Similarly, the interface module 128 can communicate information or commands received from other devices to the reset and power management modules 108, 110, 112.
More generally, the reset and power management modules are examples of “management modules” that perform management tasks (e.g., monitoring health, enabling/disabling, etc.) with respect to devices (e.g., power supplies, interconnect structures, clock subsystems, etc.).
By using the hub 106 according to some embodiments, effective bus isolation is provided in the event of failure of any of the reset and power management modules. A failed reset and power management module would be disabled from communicating over the bus structure 130 and possibly corrupting communications of other reset and power management modules.
The bus device 218 includes bus master logic 222. The bus master logic 222 of the bus device 218 is connected to a port 234 of a hub 232 (which can be the hub 106 of
Each of the bus devices 200, 206, and 212 includes respective bus slave logic 204, 210, and 216. The bus slave logic 204 in the bus device 200 is connected to a port 236 of the hub 232. The port 236 is enabled by an ENA signal provided from power-on logic 202 in the bus device 200 to a corresponding enable input of the hub 232. Similarly, the bus slave logic 210 in the bus device 206 is connected to a port 238 of the hub 232. The port 238 is enabled by a power-on logic 208 in the bus device 206 asserting an ENB signal that is provided to a corresponding enable input of the hub 232. A port 240 of the hub 232 is connected to bus slave logic 216 in the bus device 212. The port 240 is enabled by an ENC signal from power-on logic 214 of the bus device 212, which is provided to a corresponding enable input of the hub 232.
During initialization, such as during a power-on sequence or a reset sequence, each of the power-on logic 202, 208, 214, and 220 maintains a respective ENA, ENB, ENC, END signal deactivated such that the respective port 234, 236, 238, and 240 is disabled.
In the embodiment illustrated in
Note that reference to pulling down the enable signal is provided for the purpose of example. In different implementations, the inactive state of the enable signal (ENA, ENB, ENC, or END) can be a high state, in which case pull-up resistors are used to pull the enable signal high when the power-on logic tristates its EN output. In other embodiments, other types of pull-up or pull-down devices can be used for inactivating the ENA, ENB, ENC, END signals.
Each bus device 200, 206, 212, and 218 also includes a non-volatile storage 250, 252, 254, 256, respectively. The non-volatile storage can be implemented as an electrically erasable and programmable read-only memory (EEPROM), a flash memory, a battery backed-up random access memory, or other type of non-volatile storage. In another embodiment, instead of using non-volatile storage, volatile storage can be used. The non-volatile storage 250, 252, 254, 256 is used to store configuration information that is transferred to a respective bus device 200, 206, 212, 218 during a configuration stage. The configuration information that is provided for storage in the non-volatile storage 250, 252, 254, 256 includes instructions (program code) that are executable by the bus device 200, 206, 212, 218 during operation of the bus device. The instructions when executed cause the bus device to perform programmed tasks, such as management tasks.
However, if successful programming cannot be performed, such as due to a defect or other failure of the bus device, the power-on logic does not change the state of its EN output. In other words, the power-on logic either maintains its EN output tristated (to enable an external pull-down resistor to pull the respective enable signal to an inactive state), or the power-on logic drives its EN output to the inactive state. Driving the enable signal to the inactive state effectively disables the corresponding port at the hub 232 (
In the foregoing description, numerous details are set forth to provide an understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these details. While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention.