This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2009-178623, filed on Jul. 31, 2009, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a system and method of updating programs when clusters are to be added on in a multi-cluster system.
In main frame systems, multi-cluster systems have been widely used. Normally, multi-cluster systems include a plurality of clusters, a system storage unit, and a service processor manager (hereinafter referred to as an “SVPM”).
In such multi-cluster systems, in order to stably operate the system, correction of programs (called “patching”) of each cluster (computer) and each system storage unit is performed. In order for a program to be automatically corrected without stopping the entire system, it has been proposed that the clusters are connected to a remote monitoring center (see, for example, Japanese Laid-open Patent Publication No. 2006-40188).
As a consequence of the correction of a program, when the clusters are to be operated as a multi-cluster system, if the program version numbers of individual computers do not match each other, an operation error occurs. For this reason, confirmation of the compatibility of the program version numbers is performed.
In the example of
Each of the clusters 106 and 108, and the SSUs 102 and 104 has a version number management table file in which combinations of the up-to-date version number information on an HCP and version number information with which the clusters can be operated as a multi-cluster system are registered. The multi-cluster system 100 of
Then, in the SSUs 106 and 104, the version number of an HCP in operating condition (herein after referred to as “operation system HCP”) is “E16L02S-02A+5”, and the version number of a HCP in standby condition (herein after referred to as “standby system HCP”) is “E16L02S-02B+8”. Furthermore, in the clusters 106 and 108, the version number of the operation system HCP is “E60L02G-02A+5”, and the version number of the standby system HCP is “E60L02G-02B+8”.
When the cluster is to be started, the master SSU-SVP 102 checks the data of the combination of the operation system and the standby system against the version number information in the HCP of each of the clusters 106 and 108, and the SSU 104, which is registered in the version number management table, and confirms whether or not the version number of the HCP of each of the clusters 106 and 108, and the SSU 104 is a version number that can be incorporated in the system. This is referred to as a version number compatibility confirmation.
Then, a cluster and an SSU, in which a pattern that can be incorporated in the master SSU-SVP has been registered, are not incorporated in the system causing the startup of the cluster and the SSU to fail.
Additionally, an update (patch update) of the clusters 106 and 108 and the SSUs 102 and 104 is performed at the time of a periodic connection to a remote monitoring center (as an example, two hours after the startup, and thereafter, every four hours).
When the periodic connection to the remote monitoring center 120 is made, the service processor manager 101 receives a version number management table file from the remote monitoring center 120 and distributes it to the master SSU-SVP 102. Then, the master SSU-SVP 102 distributes a patch to each of the clusters 106 and 108 and the SSU 104. The master SSU-SVP 102 performs a version number compatibility confirmation, for example, two hours after the startup of the system and four hours thereafter.
The HCP includes an operation system HCP that is used during operation, and a standby-system HCP that receives a patch at the time of an update. In the version number compatibility confirmation performed by the master SSU-SVP 102, the confirmation of the compatibility of the version number of the standby system HCP is performed. When the version number of the standby-system HCP is older than the HCP version number registered in the version number management table file of the master SSU-SVP 102, the master SSU-SVP 102 instructs the cluster and the SSU to receive an up-to-date patch from the remote monitoring center 120.
After the patch is received from the remote monitoring center 120, the system is made to operate in the state of the up-to-date HCP by switching the standby-system HCP to the operation system HCP with a CE (Customer Engineer) operation.
As illustrated in
On the other hand, a method of porting setting data from another cluster with respect to such an add on of a cluster has already been proposed (for example, Japanese Laid-open Patent Publication No. 2006-40188).
However, the updating of a program receives a larger amount of data unlike the setting information. In a method of simply porting a program from another cluster, updating the program of the add-on cluster during the system operation takes much time and is a hindrance to the operation system, which is undesirable.
Furthermore, when a cluster is to be added on in the system in which a patch is transmitted from the remote center of
As illustrated in
As a result of the distribution of the up-to-date patch of
Then, as illustrated in
As illustrated in
As described above, when the version number of the HCP installed into the add-on cluster is not up-to-date and the HCP version number of the existing system is not up-to-date (for example, a case in which the add-on cluster has already been purchased, but has not been added on immediately), the update from the remote monitoring center allows only the HCP installed into the add-on cluster to be up-to-date, and an incompatible state is reached.
In order to switch the entire system to the up-to-date HCP state, the system is stopped. However, this is inconvenient because this causes the business of the customer to be interrupted.
Furthermore, the following problem arises. As a result of the periodic connection, the patch reception of each cluster is performed after a passage of four hours, and a match with the HCP version number of the add-on cluster is made. However, a CE operation in which an add-on operation is performed is awaited for a maximum of four hours, and the CE operation is not completed within the scheduled time period.
According to one example of the embodiments, a multi-cluster system including a plurality of clusters that execute a program, the plurality of clusters are configured to receive a patch from a monitoring center to update the program, a system storage unit that is connected to the plurality of clusters via a first network, and an add-on cluster to be added on to the multi-cluster system is connected to the first network, the add-on cluster receives, from the system storage unit, a version number management table and a program version number of an in-operation program in the plurality of clusters, requests the monitoring center to distribute a patch of the version number of the in-operation program in the plurality of clusters, receives the requested patch and updates the program that has been installed into the add-on cluster.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
It will be understood that when an element is referred to as being “connected to” another element, it may be directly connected or indirectly connected, i.e., intervening elements may also be present.
The two clusters 3A and 3B are connected to system storage units 4A and 4B by connection lines 6A to 6D, respectively, read or write the data and the like to or from the system storage units 4A and 4B, and perform desired processing.
Clusters 3A and 3B include a CPU block 10 that performs desired data processing, a service processor (SVP) 9, and an SVP communication adaptor (SCA: an example of a communication device) 8 connected to the service processor 9.
Furthermore, the system storage units 4A and 4B include a memory block 13 for storing data, a service processor (SVP) 12, and an SVP communication adaptor (SCA) connected to the service processor 12.
SVP 9 of the clusters 3A and 3B, and SVP 12 of the system storage units (hereinafter referred to as “SSUs”) 4A and 4B are connected to one another via SCAs 8 and 11 through a local area network (LAN) 5. Furthermore, the SVP 9 of clusters 3A and 3B, and the SVP 12 of system storage units 4A and 4B are connected to service processor manager (SVPM) 2 via SCAs 8 and 11 through LAN 5. The network of LAN 5 is closed inside SVPM 2, the clusters 3A and 3B, and SSUs 4A and 4B.
SVPM 2 and a remote monitoring center 200 are connected to one another via a telephone line 7, and are remotely connected. On the other hand, clusters 3A and 3B, and SSUs 4A and 4B are not directly connected to an external network. In this example, multi-cluster system 1 is provided at the customer side. Remote monitoring center 200 is provided remotely from that of multi-cluster system 1, performs remote monitoring of a large number of multi-cluster systems 1, and as described above, transmits a version number management table file at the time of a periodic connection of service processor manager 2 to remote monitoring center 200.
In this example, the system storage unit (SSU) 4A manages a program (a hardware control program: HCP, for example) version number. The program version number of each of clusters 3A and 3B, and SSUs 4A and 4B needs to be a version number that can be incorporated in the system storage unit 4A (hereinafter referred to as a master SSU-SVP).
Referring to
Each of system boards 20A to 20D includes a CPU 30, a system controller (SC) 32, a memory access controller (MAC) 34, and an internal memory (DIMM) 36.
System controller 32 is connected to CPU 30, memory access controller 34, and I/O ports 24A to 24D. CPU 30 performs reading from and writing into memory 36, and performs desired processing. Furthermore, system controller 32 is connected to system controller 32 of another system board, and performs transmission and reception of data and commands with the system board. In addition, system controller 32 is connected to interface circuit 22, so that CPU 30 of each of system boards 20A to 20D transmits and receives commands and data with system storage units (SSU) 4A and 4B.
SVP 9 of
A description is given below, with reference to
In addition, SSUs 4A and 4B each include a system configuration control circuit (CNFG) 46 that sets the configuration of a system storage unit, a priority control circuit (PRIO) 48 that performs priority control of memory, an SSU-SVP 12, and a system console interface circuit (SCI) 12-1 through which SVP 12 is connected to each internal circuit (interface circuit 40, memory access controller 42, memory array 44, configuration control circuit 46, the priority control circuit 48, for example), the system console interface circuit (SCI) 12-1 being used to configure settings on the internal circuits.
Each memory access controller 42 includes a port control manager (PCM) 50 connected to interface circuit 40, an array controller (ARY) 52, which is connected to port control manager 50, for accessing a memory 54, and memory 54.
Memory array 44, which is connected to port control manager 50, includes an array controller (ARY) 56 for accessing a memory 58, and memory 58.
SSU-SVP 12 of
In
Add-on clusters to be described with reference to
A description is given below, with reference to
(S10)
As illustrated in
(S12) As illustrated in
(S14) As illustrated in
(S16) As illustrated in
(S18) As illustrated in
As described above, in the present embodiment, add-on cluster 3C is provided with functions of receiving a patch from the remote monitoring center, switching between the HCPs, and being incorporated into the multi-cluster system. That is, in the present embodiment, add-on cluster 3C is provided with functions (a program, for example) of requesting the SSU-SVP for the HCP version number in response to the power supply being switched on, receiving a version number management table, and requesting the remote monitoring center for the patch. Therefore, the add-on cluster can be updated without halting the CE operation while waiting for the version number management table during periodic communication, and the add-on cluster can be incorporated into the multi-cluster system. As a result, it is possible to shorten the CE operation time.
Even when a new cluster whose operation has been halted for a long period of time for the convenience of the user, and is to be incorporated into the multi-cluster system once more, in the present embodiment, the update of the version number of the HCP can be performed instantly, so that the HCP version number of the new cluster to be incorporated matches those of the other clusters. Consequently, such a new cluster can be easily incorporated into the multi-cluster system without providing an HCP of the matching version number as that of the system that is being operated.
A description is given below, with reference to
(S20)
As illustrated in
(S22) With a CE operation, an HCP copy instruction is issued from add-on cluster 3C to clusters 3A and 3B in an operation state. That is, as illustrated in
As illustrated in
(S24) The logical volume of the HCP is copied from each of clusters 3A and 3B to add-on cluster 3C. For example, as illustrated in
(S26) A request for the next logical volume is made from add-on cluster 3C. That is, as illustrated in
(S28) As illustrated in
(S30) As illustrated in
In the second embodiment, add-on cluster 3C is provided with functions (a program, for example) of making a request for copying an HCP from an add-on cluster to a plurality of clusters during operation, switching between HCPs, and being incorporated into the multi-cluster system. Thus, the add-on cluster can be updated without halting the CE operation while waiting for receiving a version number management table from master SSU 4A to be received, and can be incorporated into the multi-cluster system. Consequently, it is possible to shorten the CE operation time.
In the second embodiment, when the HCP version number of the cluster that is newly incorporated is the up-to-date HCP version number, there is provided a mechanism for downgrading this up-to-date HCP version number by copying an HCP version number from an in-operation cluster. For this reason, even if the HCP version number of the cluster that is being operated is an old version number, because the HCP of the add-on cluster can be made to match that of the cluster during operation, the cluster can be incorporated into the multi-cluster system.
Furthermore, even if the amount of the HCP data is large, divided logical modules are copied from a plurality of clusters, and a request for the next logical volume is made to the cluster for which the copy is completed without waiting for the completion of copy of the other clusters. Consequently, it is possible to distribute the load of the requested clusters, and it is possible to copy the cluster at high speed. For this reason, the influence on an operation state can be minimized, and the add-on cluster can be incorporated into the multi-cluster system.
In the above-described embodiments, the program updating of the add-on cluster has been described using an example of an HCP. Alternatively, the program updating of the add-on cluster can be applied to other firmware programs and application programs. Furthermore, the multi-cluster system has been described using two clusters and two system storage units. Alternatively, the multi-cluster system may have three or more clusters, and the number of system storage units may be one or more. In addition, the configuration of clusters and system storage units is not limited to that of the described embodiments.
In addition, the program updating of the add-on cluster has been described using a network in which a plurality of clusters and a plurality of system storage units are individually connected to one another. However, they may be connected using a common network or a connection may be made between clusters.
All examples and conditional language recited herein are intended for pedagogical objects to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Although the embodiment(s) of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2009-178623 | Jul 2009 | JP | national |