The embodiments discussed herein are related to information processing system and a method of controlling hardware of the information processing system.
Information processing system has a plurality of processing devices (hardware) and a system control device which controls each of the plurality of processing devices. The system control device executes system control, such as power supply control and configuration control in whole system. In the information processing system with a single system control device, when the system control device failed, it is difficult that whole system operates. For this reason, the information processing system employs redundant configuration of which more than one system control devices are provided. The redundant configuration executes the system control with which one system control device is an operating state and another system control device is a standby state.
[Patent Document]
Because one of the system control devices controls a hardware of each of the plurality of processing devices by a single interface, it takes a long time to control the hardware according to increasing the processing device (hardware) which are control targets of the system control device in the information processing system. Also, when LSI (Large Scale Integrated) circuits which are control targets of the system control device increase in the processing device (system board), it takes a long time to control the hardware because a single interface is used for control.
In particular, in a large-scale information processing system such as a HPC (High Performance Computing) computer, the system control device controls a huge number of the processing devices and the LSI devices (hardware), so it clear takes a long time to control the hardware such as hardware power-on/initialization/termination.
One feature of the information processing systems includes a plurality of processing devices that each device has a hardware which performs information processing, and a plurality of system control devices that execute hardware control of the hardware in the plurality of processing devices according to a hardware control instruction, and wherein one of the plurality of system control devices determines share amount of the hardware control that each of the plurality of system control devices share according to processing amount of the hardware control of the plurality of processing devices and each of the plurality of system control devices executes the hardware control of determined the share amount in the plurality of processing devices.
Further, one feature of a method of controlling hardware in information processing system having a plurality of processing devices that each device has a hardware which performs information processing and a plurality of system control devices, the method includes calculating processing amount to execute hardware control of the plurality of processing devices by one of the plurality of system control devices according to a hardware control instruction, determining share amount of the hardware control that each of the plurality of system control devices share, and executing the hardware control of determined the share amount by each of the plurality of system control devices.
The object and advantages of the invention will be realized and attained by means of the elements and combinations part particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Below, embodiments will be described in order of information processing system, hardware control sequence, the hardware control method according to a first embodiment, the hardware control method according to a second embodiment, the hardware control method according to a third embodiment, the hardware control method according to a fourth embodiment, and other embodiments, but the information processing system and the hardware control are not limited to these embodiments.
(Information Processing System)
As illustrated in
A pair of system control devices (service processors) 1A and 1B execute monitoring and various settings of the system boards 2A and 2B. A first system control device 1A connects to the system board 2A by a first signal line LA1, and connects to the system board 2B by a second signal line LA2. A second system control device 1B connects to the system board 2A by a third signal line LB1, and connects to the system board 2B by a fourth signal line LB2.
An AC (Alternating Current) power (AC 200V) are input to a power supply unit (PSU) 32 via a circuit breaker 30. The power supply unit 32 converts the AC power (for example, 200V) to DC (Direct Current) power. The power supply unit 32 always supplies constant power (for example, DC 8V) to the service processors 1A and 1B through power lines P3 and P4. And when the power supply unit 32 receives a power-on instruction from the service processors 1A and 1B, the power supply unit 32 supplies the power (for example, DC 48V) to the system boards 2A and 2B through power lines P1 and P2.
Each service processor 1A and 1B executes hardware control sequence program 10 and control program 12. The hardware control sequence program 10 executes a hardware control sequence according to a hardware control instruction from an operator. The hardware control sequence controls each the LSI element 20 which is the hardware. For example, the hardware control sequence includes a power on sequence, an initialization process sequence, and a termination sequence as described in later.
The hardware control sequence program 10 converts executing sequence into a plurality of fine hardware control process, and instructs it to the control program 12 of the service processors 1A and 1B. The control program 12 executes the process instructed by the hardware control sequence program 10. The control program 12 outputs an instruction of I2C (Inter-Integrated Circuit) and JTAG (Joint Test Action Group) communication standards, for example, to the hardware (system board) 2A and 2B. The I2C communication standard 2B mainly performs to control power supply systems of the system boards 2A and 2B. The JTAG communication standards performs to process the LSI devices 20.
In this embodiment, instead of only active (during operation) service processor (system control device) executes the hardware control, redundant plurality of service processors 1A and 1B control the hardware control sequence in order. During the control, the service processors 1A and 1B decompose the hardware control sequence into fine control targets/control contents. And the plurality of service processors 1A and 1B share the decomposed control targets/control contents according to process amount.
An overall control flow will be described as follows. Note that, in following description, the service processor 1A is set as an operation while the service processor 1B is set as a standby.
(1) A type of the hardware control sequence to execute is determined from a user operation. That is, the user specifies the type of the hardware control sequence to the service processor 1A. In this example, a unit of control is system such as “system power on” or “system reset”.
(2) The hardware control sequence program of the service processor 1A decomposes specified hardware sequence program 10 into detailed instructions on the hardware unit.
(3) The service processor 1A determines which service processor of the redundant plurality of service processors 1A and 1B controls the detailed instruction. And the hardware control targets (system boards 2A and 2B) follow the control instructions from both of the service processor 1A and 1B.
(4) The control program 12 of each service processor 1A and 1B dispatch the shared control instruction to the system boards 2A and 2B.
In this way, because both of active service processor 1A and standby service processor 1B share execute the instructed hardware control sequence, it is possible to execute the hardware control of the plurality of system board 2A and 2B with a high speed. In
The HPC system will be described according to
Each of the CPU 22A˜22D connect to the system controller 24. The system controller 24 connects to the memory access controller 26 which is connected to the host memory 28. The system controller 24 connects a plurality of I/O (Input/Output) boards 4A˜4N via a crossbar switch 3.
The I/O boards have a I/O controller and a plurality of PCI (Peripheral Component Interconnect) Express slots 42. The PCI Express slots are connected to an external memory (a large capacity memory and/or a storage device) or a network interface card (NIC). The system controller 24 transfer controls between the CPU 22A˜22D and the memory access controller 26, and between the crossbar switch 3 and the CPU 22A˜22D, the memory access controller 26. The crossbar switch 3 directly connects between the system boards 2A and 2B and between the system Boards 2A, 2B and the I/O ports 4A˜4N, with 1 by 1.
By the crossbar switch 3, it is possible to transfer data with a high speed between the system boards 2A and 2B and between the system Boards 2A, 2B and the I/O ports 4A˜4N. In the example of
And the service processors 1A and 1B are interconnected by a communication route 50. It is preferable that the communication route 50 may use LAN (Local Area Network). A terminal device 5A connects to the communication route 50, instructs hardware control sequence to the service processor 1A and 1B according to the user operation and obtains status. Preferably the terminal device 5A consists to a personal computer.
In
As described in
The service processor 1A connects to JTAG controller 27 in the system board 2A via the JTAG signal line LA10. And the service processor 1A connects to I2C controller 29 in the system board 2A through the I2C signal line LA12. The JTAG controller 27 connects to the CPU 22A˜CPU 22D and connects to the system controller 24. The I2c controller 29 connects to the system controller 24, the memory 28 and the memory access controller 26. The memory has DIMM (Dual Inline Memory Module).
As illustrated in
The service processor 1A executes a hardware setting and initialization of the CPU 22A˜22D, the system controller 24 via the JTAG controller 27. And the service processor 1A controls a power in the system board 2A through the I2C controller 29.
As illustrated in
A board controller (described as Controller FPGA in
Each of a pair of A/D converters (Analog/Digital Converter) 37, 38 converts output voltages of the first DDC 35 and the second DDC 38 to digital values (voltage values). The I2c controller 29 reads the voltage value of each DDC from the A/D converters 37 and 38. The service processor 1A reads the voltage values of the I2C controller 29 through the signal line LA12 and checks the voltage values.
(Hardware Control Sequence)
In
Shared database files 16 are database files for sharing information between the service processors 1A and 1B. The service processors 1A and 1B write information to both shared database files 16, thereby making the shared databases in each service processors to synchronize.
The node control object 12A executes the hardware control in accordance with the hardware control sequences received from the execution object 10. The I2C control object library (hereinafter referred to as I2C control object) 12B receives the instructions from the node control object 12A, and executes processing of power system as described in
Next, the hardware control sequence will be explained in an example of power supply control sequence and reset control sequence by using
In the power supply control sequence, the service processor 1A executes the power unit 32 power-on in the process number 0. Thereby, the power is supplied to the system boards 2A and 2B. In the following process number 1, the service processor 1A executes to check the output voltage of the power supply unit 32. After checking, in the process number 2, the service processor 1A executes to output load voltage (12-volt) to the system boards 2A and 2B. As described in
In the process number 3, the service processor 1A executes to check load output of the DDC 35. As described in
After checking, in the process number 4, the output of the load voltage of the system boards 2A and 2B are executed. As described in
In the process number 5, the service processor 1A executes to check load output of the DDC 37. As described in
Similarly, in the process number 6, the service processor 1A executes to output the load voltage (3.3 V) in the system boards 2A and 2B. In the process number 7, the service processor 1A executes to check the load output of the third DDC. And in the process number 8, the service processor 1A performs to output the load voltage (1.8 V) in the system boards 2A and 2B. In the process number 9, the service processor 1A executes to check the load output of the fourth DDC.
Thus, the service processor 1A executes the power supply control sequence. In the embodiment, power saving is expected because the power is not supplied at all times to the system boards 2A and 2B from the power unit 32. And the power-on and checking of the output voltage value are performed in order of the first DDC 35, the second DDC37, the third DDC and the fourth DDC form the power unit 32. Therefore, it is possible to smooth perform the power control.
Next, the reset control sequences will be explained. In the process number 10, the service processor 1A initializes the system controller 24. As described in
In the process number 11, the service processor 1A initializes the memory access controller 26 and the memory 28. As described in
In the process number 12, the service processor 1A initializes the CPU 22A˜22D. As described in
Next, in the process number 13, the service processor 1A executes setting process of the system controller 24. As described in
In the process number 14, the service processor 1A executes setting of the memory access controller 26. As described in
In the process number 15, the service processor 1A executes setting of the CPU 22A˜22D. As described in
In the process number 16, the service processor 1A releases halts of the CPU 22A˜22D and makes the CPU 22A˜22D to a start state. As described in
In this way, each LSI elements are set and CPU 22A˜22D makes to start the operation.
The hardware control process in
(S10) A user of the system provides instructions by operating the terminal device 5A. The terminal device 5A outputs control instructions to the service processor via the LAN 50 using CLI (Command Line Interface) or GUI (Graphic User interface) in the terminal device 5A. In the embodiment, the instructions are output as an unit of system such as system power on or system reset or system power off. The instruction reception object 14 in the service processor 1B (standby state) receives the instructions issued by the system user (as indicated by (1) in
(S12) The execution object 10 in the service processor 1A, which has the operation state, determines the hardware control sequence to execute from the received instructions. In the above embodiment, the hardware control sequence is determined to the system power on or the system reset.
(S14) As described in
(S16) The execution object 10 decides which service processor executes the hardware control to perform. As will be described in detail in
(S18) The execution object 10 transfers shared process contents to the node object 12A of the service processor 1A or 1B which determines the share. At this time, the execution object 10 stores necessary data for dual switching in the shared database files 16. The store data stored in the shared database files 16 includes a check point where the hardware control sequences has executed and number of the services processors 1A or 1B which is executing the hardware control (as indicated by (3C) in
(S20) In each service processor 1A and 1B, which are shared the processing, the node control object 12A receives the instruction from the execution object 10. The node control object 12A analyzes the received instruction. The node control object 12A sends the instruction to the I2C control object 12B when the instruction is the I2C control instruction and sends the instruction to the JTAG control object 12C, when the instruction is the JTAG control instruction (as depicted by (4) in
(S22) The hardware 20 in the system boards 2A and 2B sequentially execute the received command. Therefore, as described by using
(S24) The execution object 10 of the service processor 1A updates necessary data for duplex switching in the shared database file 16 depending on reception of the response. When the synchronization of the shared database files 16 is necessary, the service processor 1A waits that the response comes back from the control program 12 (the node control object 12A, the JTAG control object 12C object, and the I2C control object 12B) of another service processor 1B (as depicted by (6) in
(S26) The execution object 10 of the service processor 1A judges whether or not all of the instructed hardware execution sequence completed. When the execution object 10 determines that all of the hardware execution sequences have not completed, the execution object 10 returns to the step S14 and advances to the execution of next hard control sequence. Conversely, when the execution object 10 judges that all of the instructed hardware execution sequence completed, the execution object 10 finishes the hardware control. As well as, when the execution object 10 receives the response from the control object 10 of another service processor 1B, the execution object 10 of the service processor 1A judges whether or not all of the instructed hardware execution sequence completed. When the execution object 10 determines that all of the hardware execution sequences have not completed, the execution object 10 returns to the step S14 and advances to the execution of next hard control sequence. Conversely, when the execution object 10 judges that all of the instructed hardware execution sequence completed, the execution object 10 finishes the hardware control.
Next, the determination process (S16) of the service processor in
(S30) The execution object 10 counts the processing amount of the hardware control in each of the service processors 1A and 1B. In the example, the execution object 10 obtains the number of LSI elements 20 mounted on each of the four system board #0˜#3 from the configuration definition table, as indicted by
(S32) The execution object 10 compares the processing amounts of each of the service processors 1A and 1B in the processing amount table in
(S34) The execution object 10 updates hardware the control sharing table 104, when the difference of the processing amount between the service processor 1A and service processor 1B is large by comparing the processing amount of each of the service processors 1A and 1B. For example, the execution object 10 updates the charge of the system board with the processing amount (the number of LSI elements in the embodiment) close to the difference of the processing amount between the service processor 1A and the service processor to another service processor from one service processor. And the execution object 10 does not updates the hardware control sharing table 104 and exits, when the difference of the processing amount between the service processor 1A and service processor 1B is small by comparing the processing amount of each of the service processors 1A and 1B. Here, it is determined whether the processing amount is small or large by a basis of the system board which mounts the smallest number of LSI elements.
For example,
The hardware control sharing table 104 is a table indicating a correspondence of the system board (here, described as “control target hardware” in Figure) to the charging service processor (here, described “SP” in Figure). A column of the system board in the table indicates the system board number (#0˜#3), and a column of the sharing service processor in the table indicates the service processor number (0=1A and 1=1B in Figure) which charges the hard control of the control target. First, the processing amount table 102 as depicted as
As described above, because of determining the service processors 1A and 1B which charge the hardware control such as equal processing amount, it is possible to execute the hardware control at high speed, even though the service processors share the hardware control.
And, when the hardware controls are shared in above construction, it is effective to respond to an issue that one service processor fails. In this case, the hardware control targets (here, it is the system board), which were assigned to the failed service processor, are selected from the hardware control sharing table 104 and are assigned to the service processor which does not fail. So it is possible to continue the hardware control corresponding to the failure of the service processor by rewriting the hardware control sharing table 104 only.
When the failed service processor restores, it is effective to again assign the hardware control targets as depicted in
In this way, it is possible to execute the hardware control at high speed, because of increasing the number of hardware which can control at the same time by parallel controlling hardware of the system boards from the a plurality of system control devices in a redundant construction. And because of decomposing the hardware control sequence in the fine control targets/control contents and dividing the processing to the plurality of service processors according to the processing amount in control of the hardware control sequence in order, it is possible to control the hardware with a higher speed.
The second embodiment is an example to apply the first embodiment to the partition configuration information processing system. A partition is defined a group of a plurality of the hardware in the information processing system and forms a section which does not interfere between the groups each other in the system. The partition will be explained by using
The second embodiment will be explained by using
(1) As similar to S10 in
(2) As similar to S12 and S14 in
(3) As similar to S16 in
Therefore, without changing the service processor which charges, according to the processing amount table 102A as depicted in
(4) As similar to S18 in
(5) As similar to S20 in
(6) As similar to S22 in
(7) The execution object 10 of the service processor 1A updates necessary data for duplex switching in the shared database file 16 depending on reception of the response. When the synchronization of the shared database files 16 is necessary, the service processor 1A waits that the response comes back from the control program 12 (the node control object 12A, the JTAG control object 12C object, and the I2C control object 12B) of another service processor 1B. The execution object 10 of the service processor 1A judges whether or not all of the instructed hardware execution sequence of the partition 0 completed. When the execution object 10 determines that all of the hardware execution sequences have not completed, the execution object 10 advances to the execution of next hard control sequence. Conversely, when the execution object 10 judges that all of the instructed hardware execution sequence completed, the execution object 10 finishes the hardware control.
(8) And during the hardware in the system boards #0, #1, #2 of the partition 0 execute the hardware control, when the instruction reception object 14 of the service processor 1A which operates receives the control instructions for the partition 1 issued by the user of the system, the service processor 1A parallel executes each control. That is, the instruction reception object 14 transfers the received control instruction of the partition 1 to the execution object 10.
(9) As similar to S12 and S14 in
(10) As similar to S16 in
(11) As similar to S18 in
(12) As similar to S20 in
As similar to S22 in
In the partition construction, the service processors 1A and 1B, which charge the hardware control, are determined to make same processing amount, therefore it is possible to execute the hardware control at high speed even though the service processors shares the processing of the hardware control. And because of executing the hardware control in another partition in empty time of the hardware control in one partition, faster hardware control can be realized.
The third embodiment illustrates processes when one service processor fails. Below, the third embodiment will be explained by using
(1) As similar to S10 in
(2) The instruction reception object 14 transfers the received control instruction to the execution object 10 of the service processor 1B. The execution object 10 of the service processor 1B checks the status of the service processor 1B, and transfers the received instruction to the execution object 10 of the service processor 1A having the operation status when the service processor 1B is the standby status.
(3A) As similar to S12 in
(3B) As similar to S16 in
(3C) As similar to step S18 in
(4) As similar to S20 in
(5) As similar to S22 in
(7) The redundant service processors 1A and 1B monitor heart beat each other. In the heart beat monitoring, Each of the service processors communicate with each other at a predetermined time interval and one service processor determines there is no heart beat response when the one service processor does not detect the communication of another processor at the predetermined time. And the determined result is notified to the hardware control sequence execution object 10 as an event.
(8) The execution object 10 re-distributes the processing, of which the failed service processor 1A has charged, according to the notification of no-response of the heart beat. That is, the execution object 10 searches the checkpoint of the hardware control which are assigned to the failed service processor 1A in the shared database file 16. The shared database file 16 synchronizes the written contents with another database file and stores number of service processor which executed the hardware control sequence and the check point where the hardware control sequence has executed.
(9) The execution object 10 update the hardware control sharing table 104, as detailed in
(10) The execution object 10 instructs a re-start of the control from the recorded check point to the node control object 12A in accordance with new hardware control sharing table 104 which was updated. That is, according to the new hardware control sharing table 104 which was updated, the execution object 10 issues the instructions of the hardware control of the system board #0, #1, #2, #3.
(11) The execution object 10 stores necessary data (a check point where the hardware control sequences has executed and number of the services processors 1A or 1B which is executing the hardware control) for dual switching in the shared database files 16.
(12) The node control object 12A sends the instruction to the I2C control object 12B when the instruction is the I2C control instruction and sends the instruction to the JTAG control object 12C, when the instruction is the JTAG control instruction. In the service processor 1B, the I2C control object 12B and the JTAG control object 12C issues the commands to the hardware (LSI elements 20) in the system boards 2A and 2B (SB#0, SB#2) through the signal line LA1˜LB2, as above described by using
The charge changing process will be described by using
(S40) The execution object 10 in one service processor selects the hardware control targets (here, the system boards #0, #2) which was assigned to the filed service processor in the hardware control sharing table 104 and assigns the hardware control targets to the service processor which does not fail when receiving the notification that the other service processor is abnormal. As illustrated in
(S42) The execution object 10 reads all hardware control sequence of executing in the shared database file 16 and determines whether or not read the hardware control sequence is charged by the failed service processor 1A. When determining the hardware control sequence which is charged by the failed service processor 1A, the execution object 10 of the service processor 1B instructs a restart from the check point of the hardware control sequence which was charged by the service processor 1A to the node control object 12A according to new hardware sharing table 104. While, when determining the hardware control sequences which is not charged by the failed service processor 1A, the execution object 10 of the service processor 1B instructs a continuous of the processing from the check point of the hardware control sequence which is charged by the service processor 1B to the node control object 12A.
In a normal status except the abnormal, the execution objects 10 in each of the service processors 1A and 1B stores service processor number which is assigned the execution for each hardware control sequence in the shared database file 16 when instructing to the node control object 12A. And, whenever the node control object 12A is proceeding with the hardware control sequence, the node control object 12A updates the checkpoint. Updated information is written to the shared database file 16 to synchronize between the service processors 1A and 1B.
In this way, when one service processor failed, the hardware control target (here, the system board), which was assigned to the filed service processor, is selected in the hardware control sharing table 104 and the hardware control target is assigned to the service processor which does not fail. Therefore, it is possible to continue the hardware control correspond to the failure of the service processor by only rewriting the hardware control sharing table 104.
Next, operations will be described as below when the failed service processor 1A is replaced and the system returns to a normal.
(A) The service processor 1A is replaced and started. The execution object 10 in the service processor 1A, which is starting, starts the synchronization with the execution object 10 of the service processor 1B which already has started. The execution object 10 in the service processor 1A, which started later on, starts with a standby status.
(B) The execution object 10 in the service processor 1B with an operation status updates the hardware control sharing table 104 according to the process described in
(C) After the node control object 12A in the service processor 1B completed the executing hardware control, the node control object 12A instructs next hardware control to the service processor 1A which is replaced.
Above embodiments were described examples that redundant system control devices (service processors) 1A and 1B share the hardware control for unit of the system board. Fourth embodiment indicates an example of redundant system control devices (service processors) 1A and 1B share the hardware control in the unit of the LSI elements in the system board.
And
And the service processor 1A charges the CPU 22A˜22D (described by #0, #1, #2, #3 in
In this way, in the fourth embodiment, the sharing of hardware control is performed in LSI element units. It is especially effective to synchronization control the hardware control using the shared database file 16 because the hardware control are shared in LSI element unit.
Next, it will be explained as an example that the share of the hardware control in the LSI element unit is applied to the information processing system of the partition configuration described in the second embodiment.
As indicated as
The decision process of the charge service processor compares the processing amount of each service processor, determines the charge service processor which charges each of the control target hardware and creates the hardware control sharing table in
And the service processor 1A (charge SP=0) charges the CPU 22A˜22D (described by #0, #1, #2, #3 in
In this way, in the partition construction, the sharing of hardware control is executed in LSI element units. It is especially effective to synchronization control the hardware control using the shared database file 16 because the hardware control are shared in LSI element unit.
In above embodiments, the hardware control sequence was described as the power on sequence and reset sequence, but can be applied the other hardware control sequence such as power off sequence, etc. Also, the hardware control from the system control device is described in I2C and JTAG control, but can employ the other control configuration. Further, when the salvation of one service processor failed is not considered, it is possible to omit the synchronization feature of the shared database file.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2010/057819 filed on May 7, 2010 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2010/057819 | May 2010 | US |
Child | 13669895 | US |