This invention relates to a technique for dynamically changing a system configuration.
Dynamic Partitioning (hereinafter, called “DP”) is a technique (also called “hot swapping”) for taking out and putting in a Central Processing Unit (CPU. also called “processor”), memory or the like while a system is operating.
Typically, when an opportunity of a DP operation such as a failure of the CPU or memory in a system is detected, an administrator of the system performs a DP operation for the CPU or memory. However, when the CPU is extracted and inserted during the operation of the system, an influence to the system by extracting and inserting the CPU has to be considered, and there is a case where it is inappropriate that the DP operation is performed in response to the detected opportunity.
In addition, for example, as a technique for dynamically reconfiguring resources, there is a technique for performing a proposed operation after determining whether or not the proposed operation of the reconfiguration follows a policy of the resource allocation. However, the DP operation for the CPU is not deeply considered.
Patent Document 1: Japanese Laid-open Patent Publication No. 7-295841
Therefore, there is no conventional technique for enabling confirmation of whether or not the DP operation is appropriate.
A management apparatus relating to this invention includes an acceptance unit to accept an instruction to dynamically change a processor configuration in a system that includes plural processors, and (B) a processing unit to identify a performance value of a system corresponding to a processor configuration caused by instructed dynamic change, determine whether or not the identified performance value is equal to or greater than a requested performance value for the system, and perform a processing to change the processor configuration instructed by the accepted instruction, upon determining that the identified performance value is equal to or greater than the requested performance value.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
Data of errors or the like that occurred in the cell on the board 210 is stored in the data storage unit 240 as error logs. Moreover, assume that the controller 230 can obtain load data of the CPU (e.g. CPU utilization rate, memory utilization amount and/or the like) on the board 210. Furthermore, the controller 230 outputs data (also called “error data”) of the error logs and/or load data to the management apparatus 100 in response to a request from the management apparatus 100 or the like. The management target system 200 is the same as the conventional one.
The monitoring unit 110 obtains the load data and error data from the management target system 200 periodically, or at arbitrary timings. The input and output unit 130 accepts inputs from the administrator of the management target system 200, and outputs an alarm, precheck result and the like. The data storage unit 140 stores data in the process of processing. The precheck processing unit 120 performs processing to determine in advance whether or not the DP operation should be performed.
The system configuration information storage unit 150 stores system configuration information such as data of a memory configuration on the board 210 in the management target system 200 and/or CPU topology data.
Data of the memory configuration is data that represents the application status of a memory RAS (Reliability Availability and Serviceability) function (e.g. memory mirroring, memory sparing, memory error reporting and the like).
In addition, CPU topology data is data of performance values for respective CPU topology. An example where two CPUs are included in each of three cells will be explained. In other words, CPU0 and CPU1 are included in cell 1, CPU2 and CPU3 are included in cell 2, and CPU4 and CPU5 are included in cell 3. As illustrated in (a) of
In case of such CPU topology, data as illustrated in
In addition, the system load prediction data storage unit 160 stores load prediction data of the management target system 200. The load prediction data is data as illustrated in
In an example of
The system load prediction data storage unit 160 also stores data as illustrated in
Next, operations of the management apparatus 100 will be explained by using
For example, the notification to the administrator is performed when a series of correctable errors are detected in the CPU or memory, when a sign of the performance shortage is detected, in other words, in a case where the system load exceeds the threshold, or when any failure that occurred in the cell is detected. The administrator performs the DP operation in order to exchange the cell in which the error was detected or to add a cell in order to avoid the performance shortage. However, in order to confirm whether or not the DP operation can be actually performed, the management apparatus 100 is caused to execute the following processing before the DP operation is actually performed.
Typically, in many case, administrators do not have sufficient knowledge for the CPU topology as illustrated in
Firstly, the input and output unit 130 accepts inputs of information of details on the DP operation, which are associated with the CPU, and outputs the input data to the precheck processing unit 120 (
Then, the precheck processing unit 120 performs precheck processing (step S3). The precheck processing will be explained by using
Firstly, the precheck processing unit 120 obtains error data for a predetermined time period, which is stored in the data storage unit 240 of the management target system 200, through the monitoring unit 110 and the controller 230 of the management target system 200, and stores the obtained error data in the data storage unit 140 (
Moreover, the precheck processing unit 120 obtains load data from the controller 230 through the monitoring unit 110, and stores the obtained load data in the data storage unit 140 (step S13).
Then, the precheck processing unit 120 identifies CPU topology and performance data, which are caused by the DP operation, based on the number of the cell to be removed by the DP operation, by using data for the CPU topology, which is stored in the system configuration information storage unit 150 (step S15). For example, when the current CPU topology (i.e. cell configuration) corresponds to the state of (a) of
Furthermore, the precheck processing unit 120 reads out the load prediction data from the system load prediction data storage unit 160 (step S17). Data that represents the temporal change of the system load as illustrated in
In addition, the precheck processing unit 120 reads out application status data of the memory RAS functions from the system configuration information storage unit 150 (step S18).
The steps S11 to S18 are preprocessing, and the step S11 may be executed immediately before the step S19, and the step S13 may be executed immediately before the step S21, and the step S15 may be executed immediately before the step S23, and the step S18 may be executed immediately before the step S25.
The processing shifts to processing of
When the burst error occurred, it is inappropriate to perform the DP operation, the precheck processing unit 120 sets NG (i.e. impossible to perform the DP operation) as a precheck result (step S29). Then, the processing returns to a calling-source processing.
On the other hand, when the burst errors do not occur, the precheck processing unit 120 determines, based on the obtained load data, whether or not the management target system 200 is in an overload state (step S21). It is determined whether or not the current load (e.g. CPU utilization ratio, memory utilization rate and the like) exceeds a threshold (e.g. 90%). This is because the performance deterioration occurs when the DP operation is performed in the overload state, and an impact on the entire system may become large. Also in this step, it may be confirmed, based on the system load prediction data as illustrated in
When the management target system 200 is in the overload state, the processing shifts to step S29. On the other hand, when the management target system 200 is not in the overload state, the precheck processing unit 120 determines whether or not the CPU performance after removing the cell by the DP operation is sufficient within the period of the DP operation (step S23).
For example, a case is considered that the DP operation that makes the transition from (a) of
Here, when the system load prediction is as illustrated in
On the other hand, the performance that was identified at the step S15 and corresponds to the CPU topology is (1 GHz*4 CPU*MP coefficient)*0.7 (=2.8 GHz*1 CPU*MP coefficient), because this CPU topology corresponds to a pattern in which the performance deterioration occurs.
Then, when the CPU performance after the cell is removed by the DP operation is compared with the load request during the period of the DP operation, the latter is larger. Therefore, a state in which the performance is insufficient for the load request is obtained during the period of the DP operation. Therefore, the performance of the DP operation at this timing is a problem, and the DP operation is staved off.
On the other hand, when the CPU performance after the cell is removed by the DP operation is equal to or greater than the load request during the period of the DP operation, the DP operation can be performed without any problem.
When the CPU performance after the cell is removed by the DP operation is not sufficient during the period of the DP operation, the processing shifts to step S29. On the other hand, when the CPU performance after the DP operation is sufficient during the period of the DP operation, the precheck processing unit 120 determines whether or not a condition relating to the memory is satisfied (step S25). More specifically, it is determined, based on the data obtained at the step S18, whether or not any memory RAS function is applied, and it is determined, based on the data obtained at the step S11, whether or not a condition that an error occurs within a predetermined time in a memory to which the memory RAS function is applied is satisfied.
In a certain management target system, the memory RAS function is invalidated during the DP operation. When the DP operation is performed in such a system, the system down may occur when the memory error or the like occurs during the DP operation. Supposing that the DP operation is not performed, the operation of the system may continue by recovering the error by the memory RAS function such as memory sparing. Therefore, when the error occurs within the predetermined time in the memory to which the memory RAS function is applied, the DP operation is staved off in order to avoid such a danger. When any memory RAS function is supported and the cell includes the memory, this condition is considered. However, the step S25 may not be performed when a system in which the memory RAS function is not supported or when the cell does not include the memory.
Therefore, when such a condition is satisfied, the processing shifts to the step S29. On the other hand, when such a condition is not satisfied, the precheck processing unit 120 sets OK as the precheck result (step S27). Then, the processing returns to the calling-source processing.
In this embodiment, whether or not the DP operation can be performed is determined based on the burst error, overload, CPU performance and the memory condition, however, more conditions may be used.
Returning to the explanation of the processing in
On the other hand, when the precheck result is NG, the precheck processing unit 120 updates the system configuration information in the system configuration information storage unit 150 according to information of details on the DP operation (step S9). When the reactivation of the management target system 200 is performed next time, the reactivation is performed based on the system configuration after the DP operation. The updated system configuration information may be stored in the data storage unit 240 of the management target system 200 or the like through the monitoring unit 110 and the controller 230.
The precheck processing unit 120 causes the input and output unit 130 to output a message that represents that the DP operation is impossible. By doing so, the administrator can recognize that the DP operation cannot be performed at the present timing.
According to this embodiment, it becomes possible to automatically determine, in advance, whether or not the DP operation can be performed. By doing so, after confirming that the DP operation can be performed while suppressing the influence on the entire management target system 200, the actual DP operation is performed. In addition, when the present time is an inappropriate timing, that DP operation is staved off.
Although the embodiments of this invention were explained above, this invention is not limited to those. For example, the functional block diagram illustrated in
Furthermore, the example of
Furthermore, the functions of the management apparatus 100 may be shared by plural computers.
In addition, the aforementioned management apparatus 100 is a computer device as depicted in
In addition, as illustrated in
The aforementioned embodiments are outlined as follows:
A management method relating to the embodiments includes (A) upon accepting an instruction to dynamically change a processor configuration in a system that includes plural processors, identifying a performance value of a system corresponding to a processor configuration caused by instructed dynamic change; (B) determining whether the identified performance value is equal to or greater than a requested performance value for the system; and (C) upon determining that the identified performance value is equal to or greater than the requested performance value, performing a processing to change the processor configuration instructed by the instruction.
Because the level of the performance deterioration caused by the dynamic change of the processor configuration may be different, it becomes possible to determine, in advance, whether or not the dynamic change of the processor configuration is suitable, by determining, based on the performance value of the system corresponding to the processor configuration caused by the dynamic change, whether or not the performance value is equal to or greater than the requested performance value.
The aforementioned requested performance value may be calculated according to loads in the system. The performance deterioration caused by the dynamic change of the processor configuration may be acceptable depending on the load of the system.
Furthermore, the aforementioned requested performance value may be calculated according to loads of the system within a predetermined time since the present time. This calculation is performed to cope with a case that the system load increases during the dynamic change of the processor configuration.
Furthermore, the aforementioned requested performance value may be calculated according to a peak of the loads of the system within a predetermined time required for the instructed dynamic change since the present time. This is because there is no problem if the peak of the load of the system can be processed.
Furthermore, the aforementioned management method may further include determining whether at least one of a first condition that errors occurred in the system at a frequency that is equal to or greater than a first predetermined level, a second condition that a load in the system is equal to or greater than a second predetermined level and a third condition that an error occurred in a memory of the system, to which a memory Reliability Availability and Serviceability (RAS) function is applied is satisfied. This is because there are not only the performance value of the processor, but also other items whose influence on the entire system is to be considered.
Incidentally, it is possible to create a program causing a processor to execute the aforementioned processing, and such a program is stored in a computer readable storage medium or storage device such as a flexible disk, CD-ROM, DVD-ROM, magneto-optic disk, a semiconductor memory such as ROM (Read Only Memory), and hard disk. In addition, the intermediate processing result is temporarily stored in a storage device such as a RAM or the like.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuing application, filed under 35 U.S.C. section 111(a), of International Application PCT/JP2013/069056, filed on Jul. 11, 2013, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2013/069056 | Jul 2013 | US |
Child | 14988184 | US |