The present application claims priority from Japanese Patent Application JP2011-276621 filed on Dec. 19, 2011, the content of which is hereby incorporated by reference into this application.
The present invention relates to an information processing system and an operation management method for an information processing system, and more particularly to the efficient use of the resources of an information processing system.
A plurality of work loads possibly affect the performances of the work loads to each other when sharing the resources of an information processing system. There is a technique that optimizes the operations of a plurality of applications by adjusting application parameters in the case where resources configuring an information processing system are shared between the applications (International Publication No. WO2008/006027).
In the technique disclosed in International Publication No. WO2008/006027, the overhead of processing occurs for optimizing application parameters. On the contrary, in allocating a plurality of work loads in an information processing system including resources possibly shared between the work loads, for example, when allocation is made so as not to overload a resource shared between the work loads (in the following, referred to as a shared resource), it is unnecessary to optimize application parameters, and the performance of the entire system is improved.
The present invention is made in view of the above problem. It is an object to prevent overload on a shared resource.
In a representative aspect of the present application, in an information processing system including a resource possibly shared between work loads, an additional load is applied to the resource, and the performances of the work loads are monitored while performing the work loads.
According to the aspect of the present invention, it is possible to allocate work loads based on the result of monitoring the performances of the work loads, and it is possible to prevent overload on a shared resource.
The present invention will become fully understood from the detailed description given hereinafter and the accompanying drawings, wherein:
In the following, embodiments of the present invention will be described with reference to the drawings.
The server device 160, the server device 170, and the manager server device 200 are connected to each other through the network switch 1302. Furthermore, the server device 160, the server device 170, and the storage device 150 are connected to each other through the SAN switch 1402. Therefore, the network switch 1302 and the SAN switch 1402 are shared between the server device 160 and the server device 170.
It is noted that in the embodiment, the network switches 1301 and 1302 are formed under the same specifications. Furthermore, in the embodiment, the SAN switches 1401 and 1402 are formed under the same specifications. In addition, the information processing system 100 can expand the system by further increasing sets of server devices, network switches, and SAN switches having connections similar to the connections between the server devices 160 and 170, the network switch 1302, and the SAN switch 1402 as necessary.
The server device 110 that is an information processing apparatus includes a processor 111, a memory 112, a network interface (I/F) 113, and a host bus adapter (HBA) 114 that is an interface to connect to the storage device 150. Such an example is illustrated in a state in which virtual machines (VM) 1151 and 1152 are stored on the memory 112. Such an example is illustrated in a state in which a work load (WL) 1161 is operated on the virtual machine 1151 and a work load 1162 is operated on the virtual machine 1152.
In the embodiment, the server device 120 is formed under the same specifications as the server device 110, including a processor 121, a memory 122, a network interface (I/F) 123, and a host bus adapter 124 that is an interface to connect to the storage device 150. Such an example is illustrated in a state in which virtual machines 1251 and 1252 are stored on the memory 122. Such an example is illustrated in a state in which a work load 1261 is operated on the virtual machine 1251.
In the embodiment, the server device 160 is formed under the same specifications as the server device 110, including a processor 161, a memory 162, a network interface (I/F) 163, and a host bus adopter 164 that is an interface to connect to the storage device 150. Such an example is illustrated in a state in which virtual machines 1651 and 1652 are stored on the memory 162. In the embodiment, it is supposed that the server device 170 is formed under the same specifications as the server device 110, including a processor 171, a memory 172, a network interface (I/F) 173, and a host bus adapter 174 that is an interface to connect to the storage device 150. Such an example is illustrated in a state in which virtual machines 1751 and 1752 are stored on the memory 172. In
The manager server device 200 includes a processor 201, a memory 202, a network interface (I/F) 203, and a host bus adapter 204 that is an interface to connect to the storage device 150. In
In the embodiment, the measurement server device 250 is formed under the same specifications as the server device 110, including a processor 251, a memory 252, a network interface (I/F) 253, and a host bus adapter 254 that is an interface to connect to the storage device 150. Such an example is illustrated in a state in which virtual machines 2551 and 2552 are stored on the memory 252. Such an example is illustrated in a state in which a work load 2561 is operated on the virtual machine 2551.
The server device 110, the server device 120, the server device 160, and the server device 170 process work loads inputted to the information processing system 100. An administrator determines to allocate the virtual machines to the server devices 110, 120, 160, and 170. The manager server device 200 determines to allocate a work load to which one of the server devices 110, 120, 160, and 170 according to a method described later.
The manager server device 200 allocates the work loads inputted to the information processing system 100 based on the result of monitoring the performances of the work loads while applying loads caused by the work loads and an additional load to a resource such as the network switch 1301 and the SAN switch 1401; the resource is possibly affected on the performances of the work loads due to conflict when the resource is shared between the work loads.
The measurement server device 250 receives an instruction from the manager server device 200, and operates a test work load to communicate with the dummy load generating unit 205 through the network switch 1301, and applies a load to the network switch 1301. It is noted that in the embodiment, the server device 250 is referred to as a measurement server device for easy understanding. The measurement server device 250 is a server device to which a test work load is applied in measuring work load performance, described later. In the case where work load performance is not measured, the server device 250 can function as a server as similar to the server device 110.
The storage device 150 stores performance policy information 151, a performance table 152, and system configuration information 153, which are used in processing in the manager server device 200. Moreover, the storage device 150 stores work loads, programs, data, and so on used in the information processing apparatuses.
The performance policy information 151 includes parameters indicating the necessary performances of work loads to be maintained at the minimum after predicting the deterioration of performance due to conflict between a work load and another work load sharing a resource in operating work loads on the information processing system 100. The performance policy information 151 is inputted by a user as parameters before allocating a work load, for example.
The resource information 2002 to apply an additional load is information that identifies a resource to which an additional load is applied in measuring work load performance, described later, for a work load identified by the ID number 2001. In the information 2003 about the resources to which an additional load is applied, at least one resource is inputted to the ID number like ID numbers “1”, “2”, “3”, and “5”, or no resource is inputted to the ID number like an ID number “4” in the case where it is unnecessary to measure work load performance.
The information 2003 about work loads to make a set receives a number to identify a work load on which work load performance is measured in combination with a work load identified by the ID number 2001. In the case where a work load identified by an ID number is alone subjected to work load performance measurement, no data is inputted to the work load information 2003 about work loads to make a set like the ID number “5”.
The work load disposition constraint information 2004 includes information about a work load disposition constraint applied to a work load identified by the ID number 2001 and a work load to make a set described in the information 2003. Data is inputted to the work load disposition constraint information 2004 in the case where applications that are work loads are disposed in a plurality of server devices for synchronization and data transfer between the applications through a network device because of redundancy such as mirroring or parallel processing in a plurality of calculation nodes, for example. In
The necessary performance information 2005 includes the necessary performances of a work load identified by the ID number 2001 and a work load to make a set in the information 2003; the necessary performances are sought by the user, as necessary. In
The system configuration information 153 includes system configuration information about the information processing system 100. For example, the system configuration information 153 includes information about the types and specifications of the server devices, the network switches, the SAN switches, and the storages included in the information processing system 100 and the connections between these devices.
The performance table 152 includes data showing the degrees of the deterioration of the performances of the work loads whose necessary performances included in the performance policy information 151 are shown. The degrees of the deterioration of the performances are obtained by measuring work load performance, described later, and the deterioration of the performances of the work loads is due to conflict with other work loads sharing resources. In the embodiment, the performance table 152 includes data of the relationship between load amounts applied to the resources listed on the resource information 2002 and the performances of the work loads, to the load amounts, whose necessary performances included in the performance policy information 151 are shown. Here, among resources, the resources for the performance policy information 151 and the performance table 152 are resources that possibly affect the performances of the work loads because the performances of the resources seen from the work loads are degraded due to conflict. The degradation of the performances of the resources seen from the work loads means that the resources are shared and thus degraded. For example, in the case of the network switch, communication throughput for the work loads is degraded. Examples of the load amounts on the shared resources include use frequency bands for the network interfaces, use frequency bands for the storage array, the CPU usage rate for IOPSs and the processors, the memory band width for a memory controller, and so on. Examples of work load performances to be measured include the number of transactions per unit time in online transaction processing, the average response time of requests in the web server, and so on. Moreover, in the case where it is known that CPU loads are dominant as in scientific technology calculation processing, CPU time used per unit time can be used for an index as work load performance.
In the information processing system 100 according to the embodiment, performance tables showing the quantitative relationship between the load amounts on the shared resources and the performances of the work loads on the load amounts are held, so that the influence on the performances of the work loads sharing the resources can be quantitatively predicted, and the allocation of the work loads can be determined based on the prediction. Therefore, the performance of the work load can be easily secured.
An exemplary conflict between work loads sharing resources will be described with reference to
In the following, the operation of the work load performance acquiring unit 206 will be described with reference to
The virtual machines 1151 and 1152 are stored on the memory 112 of the server device 110. In the virtual machine 1151, a work load 1163 is operated. The virtual machines 1251 and 1252 are stored on the memory 122 of the server device 120. In the virtual machine 1251, a work load 1262 is operated.
The virtual machines 2551 and 2552 are stored on the memory 252 of the measurement server device 250. In the virtual machine 2551, a work load 2562 that applies a load to at least one of the network switch 1301 and the SAN switch 1401 is operated in cooperation with the dummy load generating unit 205 of the manager server device 200.
The work load 1163 and the work load 1262 apply loads to the network switch 1301 by communicating data with each other during operation. During the operation of the work load 1163 and the work load 1262, the manager server device 200 changes a load amount applied to at least one of the network switch 1301 and the SAN switch 1401 by the dummy load generating unit 205 and the work load 2562, and the work load performance acquiring unit 206 acquires the performance values of the work load 1163 and 1262 for every load amount on the network switch 1301 and the SAN switch 1401, which are resources possibly shard between the work loads.
First, the user inputs information necessary to acquire the performance table 152 through the manager server device 200 (Step 800). The information to be inputted includes the performance policy information 151 and the system configuration information 153. Subsequently, the manager server device 200 acquires the performance policy information 152 and the system configuration information 153 out of the storage 150, and stores the information on the memory 202 (Step 801).
In Step 802, the manager server device 200 selects a work load whose performance is acquired in ascending order of numbers in the work load ID number 2001, for example. Subsequently, the manager server device 200 selects one server device, or a plurality of server devices, as necessary, used for measuring the performance of a work load based on the system configuration information 153 (Step 803). Moreover, the manager server device 200 selects one measurement server device to generate a dummy load, as necessary (Step 804). The user can in advance determine information about candidates for the server device and the measurement server device for measuring the performance value of the work load, and can store the information in the system configuration information 153.
In Step 805, the manager server device 200 instructs the server devices selected in Step 803 to acquire work loads to be measured out of the storage 150. The instructed server devices then acquire the instructed work loads out of the storage 150 (Step 806).
In Step 807, the manager server device 200 instructs the server devices selected in Step 803 to start to operate the acquired work loads. The server devices instructed to start to operate the acquired work load (Step 808). The manager server device 200 then instructs the servers starting the operation to report, to the manager server device 200, information about the load amount applied by the work load to the resource specified in the resource information 2002 (Step 809). The server devices instructed to report the information report, to the manager server device 200, the information about the load amount applied by the work load to the resource specified in the resource information 2002 (Step 810).
In Step 811, for the preparation of additionally applying a dummy load, when there is a measurement server device selected in Step 804, the manager server device 200 causes the measurement server device to acquire a work load to apply a load to a resource specified in the resource information 2002 out of the storage 150 in cooperation with the dummy load generating unit 205. The dummy load generating unit 205 and the measurement server device selected in Step 804, in the case where there is the measurement server device selected in Step 804, start to additionally apply a load to the resource specified in the resource information 2002, that is, the dummy load generating unit 205 and the measurement server start to additionally apply a dummy load (Step 812).
In Step 813, the manager server device 200 adjusts the dummy load amount applied to the resource specified in the resource information 2002. The work load performance acquiring unit 206 then instructs the servers selected in Step 803 to report information about the performance values of the work loads being operated on the servers and information about the load amounts that the work loads being operated on the servers apply to the resources specified in the resource information 2002 (Step 814). The servers selected in Step 803 then measure the load amounts that the work loads being operated on the servers apply to the resources specified in the resource information 2002 (Step 815). Moreover, the servers selected in Step 803 measure the performance values of the work loads being operated on the servers (Step 816). The servers selected in Step 803 send information about the measurement results in Steps 815 and 816 to the manager server device 200 (Step 817). The manager server device 200 stores information received from the servers selected in Step 803 in the storage 150 (Step 818).
In Step 819, the manager server device 200 determines whether the performance values, which are work load performance information received in Step 818, are below the necessary performance sought by the user in the necessary performance information 2005. In Step 819, in the case where it is determined that the performance values of the work loads are not below the necessary performance, which is thresholds, sought by the user in the necessary performance information 2005, the manager server device 200 determines a dummy load amount to be subsequently measured (Step 820), and the process goes to Step 813. When the process proceeds, the dummy load amount is increased every time when the process in Step 820 is performed. In the case where it is determined that the performance values of the work loads are below the thresholds in Step 819, the process goes to Step 821.
In Step 821, the manager server device 200 determines whether processing is completed on all the work loads in the work load ID number 2001. In Step 821, in the case where the manager server device 200 determines that processing is not completed on all the work loads in the work load ID number 2001, the process goes to Step 802. In Step 821, in the case where the manager server device 200 determines that processing is completed on all the work loads in the work load ID number 2001, the process is ended.
As described above, the information processing system 100 repeats the processes from Step 813 to Step 818. Namely, in the case where there are the selected work load and the work load used in combination with the selected work load, the information processing system 100 applies an additional load to a resource, and monitors the performances of the selected work load and the work load used in combination with the selected work load as necessary while operating the selected work load and the work load used in combination with the selected work load on the server device, which is the information processing apparatus.
The performance table 152, which is the monitored result, shows the range where desired performances of the work loads can be obtained even though the resource is shared, so that it is possible to allocate the work loads to the server devices, which are the information processing apparatuses, so as to obtain desired performances.
In a second embodiment, the allocation of work loads is exemplified in the case where work loads described in performance policy information 151 in
For the initial state, suppose that the server devices 110, 120, 160, 170, and 250 are stopped for decreasing the power consumption of the information processing system 100. Moreover, according to the method described in the flowchart in
In the following, the content of specific allocation according to the embodiment will be described.
In the embodiment, the manager server device 200 extracts work loads in ascending order of numbers in the work load ID number 9001. Therefore, first, the work load A of the work load ID number “1” and the work load B are extracted.
The manager server device 200 selects server devices to operate the work load A and the work load B based on the system configuration information 153, and stores the server devices on the memory of the manager server device 200 as the server devices scheduled to operate. For the order of selecting the server devices, such a configuration may be possible in which the order determined by the user in advance is stored in the storage device 150 as information about selection order, and the information is used. Here, the manager server device 200 determines that the work load A is operated on the server device 110 and the work load B is operated on the server device 120, and stores the server devices 110 and 120 on the memory of the manager server device 200 as the server devices scheduled to operate.
Subsequently, the manager server device 200 extracts the work load C and the work load D according to the ID number 9001. In order to reduce the number of server devices to operate for power savings, the manager server device 200 assumes to operate the work load C on the server device 110 and the work load D on the server device 120, and causes the work load performance predicting unit 207 to calculate, based on the performance table 152, the predicted performances of the work loads A to D that are assumed to apply, to the network switch 1301, loads caused by the work load A, the work load B, the work load C, and the work load D based on the resource information 9002.
The work load performance predicting unit 207 calculates the prediction, and the manager server device 200 determines the allocation of the work loads based on the necessary performance 9005 set by the user and the result calculated by the work load performance predicting unit 207. In the case of the embodiment, since it is determined that the performance of the work load C does not satisfy 150 tps, which is the performance of the work load set by the user, the manager server device 200 makes reference to the system configuration information 153 and work load disposition constraint information about work loads to make a set in the information 9003, and extracts the server devices 160 and 170 that are not connected to the network switch 1301 but connected to the network switch 1302. The manager server device 200 determines that the work load C is operated on the server device 160 and the work load D on the server device 170, and stores the server devices 160 and 170 on the memory of the manager server device 200 as the server devices scheduled to operate.
Subsequently, the manager server device 200 extracts the work load E and the work load F according to the ID number 9001. In order to reduce the number of server devices to operate for power savings, the manager server device 200 assumes to operate the work load E on the server device 110 and the work load F on the server device 120, and causes the work load performance predicting unit 207 to calculate, based on the performance table 152, the predicted performances of the work loads A, B, E, and F that are assumed to apply, to the network switch 1301, loads caused by the work load A, the work load B, the work load E, and the work load F to the network switch 1301 based on the resource information 9002.
The work load performance predicting unit 207 calculates the prediction, and the manager server device 200 determines the allocation of the work loads based on the necessary performance 9005 set by the user and the result calculated by the work load performance predicting unit 207. In the case of the embodiment, since it is determined that the performances of the work loads A, B, E, and F satisfy all the necessary performances of the work loads set by the user, the manager server device 200 determines that the work load E is operated on the server device 110 together with the work load A and the work load F is operated on the server device 120 together with the work load B.
As described above, since the allocation of the work loads A to F is determined, the manager server device 200 causes the work load allocation and resource control unit 208 to cancel the halt of the server devices stored on the memory of the manager server device 200 as the server devices scheduled to operate, and operates the server devices. In the embodiment, the manager server device 200 operates the virtual machines 1151 and 1152 on the server device 110 and the server device 110, the virtual machines 1251 and 1252 on the server device 120 and the server device 120, the virtual machines 1651 and 1652 on the server device 160 and the server device 160, and the virtual machines 1751 and 1752 on the server device 170 and the server device 170. The manager server device 200 then disposes the work load A on the virtual machine 1151, the work load B on the virtual machine 1251, the work load C on the virtual machine 1651, the work load D on the virtual machine 1751, the work load E on the virtual machine 1152, and the work load F on the virtual machine 1252, and operates the work loads A to F.
As described above, even in the state in which the network switch 1301 is a shared resource, it is possible to implement allocation in which the work loads A to F can be operated in the state in which the necessary performances set by the user are satisfied. It is noted that in the embodiment, the ID number 9001 ranges from “1” to “3”. However, even in the case where there are an ID number “4” and more, allocation can be similarly made as for the ID numbers “1” to “3”.
When scheduled allocation for all the ID numbers in the ID number 9001 is finished, the work load allocation and resource control unit 208 instructs the server devices to acquire the work loads according to the result of allocation described above, and instructs the server devices to operate the acquired work loads.
In the embodiment, the embodiment is configured in which a single work load is disposed on a single virtual machine. However, a plurality of work loads may be disposed on a single virtual machine. Moreover, in the embodiment, the performances of the work loads A to F are predicted. For example, in the case where the performance of the work load A is more important than the performance of the work load B in the set of the work loads A and B, it is possible to perform the allocation of the work loads based on the performance of the important work load A when only the necessary performance of the work load A is inputted to the necessary performance information 9005.
In a third embodiment, the allocation of work loads is exemplified in the case where work loads in performance policy information 151 in
For the initial state, suppose that the server devices 110, 120, 160, 170, and 250 are stopped for decreasing the power consumption of the information processing system 100. Moreover, according to the method described in the flowchart in
The allocation and operation of the work loads according to the embodiment can also be implemented according to the flowchart in
In the embodiment, the manager server device 200 extracts work loads in ascending order of numbers in the work load ID number 1301. Therefore, first, the work load J of the work load ID number “1” is extracted.
The manager server device 200 selects a server device to operate the work load J based on the system configuration information 153, and stores the server device on the memory of the manager server device 200 as the server device scheduled to operate. For the order of selecting the server devices, such a configuration may be possible in which the order determined by the user in advance is stored in the storage device 150 as information about selection order, and the information is used. Here, the manager server device 200 determines that the work load J is operated on the server device 110, and stores the server device 110 on the memory of the manager server device 200 as the server device scheduled to operate.
Subsequently, the manager server device 200 extracts the work load K according to the ID number 1301. In order to reduce the number of server devices to operate for power savings, the manager server device 200 assumes to operate the work load K on the server device 110, and causes the work load performance predicting unit 207 to calculate, based on the performance table 152, the predicted performances of the work loads J and K that are assumed to apply loads caused by the work load J and the work load K to the SAN switch 1401 based on the resource information 1302.
The work load performance predicting unit 207 calculates the prediction, and the manager server device 200 determines the allocation of the work loads based on the necessary performance 1305 set by the user and the result calculated by the work load predicting unit 207. In the case of the embodiment, since it is determined that the performance of the work load J does not satisfy 150 tps, which is the performance of the work load set by the user, the manager server device 200 makes reference to the system configuration information 153, and extracts the server device 160 that is not connected to the SAN switch 1401 but connected to the SAN switch 1402. The manager server device 200 determines that the work load K is operated on the server device 160, and stores the server device 160 on the memory of the manager server device 200 as the server device scheduled to operate.
Subsequently, the manager server device 200 extracts the work load L according to the ID number 1301. In order to reduce the number of server devices to operate for power savings, the manager server device 200 assumes to operate the work load L on the server device 110, and causes the work load performance predicting unit 207 to calculate, based on the performance table 152, the predicted performances of the work loads J and L that are assumed to apply loads caused by the work load J and the work load L to the SAN switch 1401 based on the resource information 1302.
The work load performance predicting unit 207 calculates the prediction, and the manager server device 200 determines the allocation of the work loads based on the necessary performance 1305 set by the user and the result calculated by the work load predicting unit 207. In the case of the embodiment, the manager server device 200 determines that the work load L is operated on the server device 110 together with the work load J and the work load K is operated on the server device 120.
As described above, since the allocation of the work loads J to L is determined, the manager server device 200 causes the work load allocation and resource control unit 208 to cancel the halt of the server devices stored on the memory of the manager server device 200 as the server devices scheduled to operate. In the embodiment, the manager server device 200 operates the virtual machines 1151 and 1152 on the server device 110 and the server device 110, and the virtual machines 1651 and 1652 on the server device 160 and the server device 160. The manager server device 200 then disposes the work load J on the virtual machine 1151, the work load K on the virtual machine 1651, and the work load L on the virtual machine 1152, and operates the work loads J to L.
As described above, even in the state in which the SAN switch 1401 is a shared resource, it is possible to implement allocation in which the work loads J to L can be operated in the state in which the necessary performances set by the user are satisfied. Moreover, in the embodiment, the server device 110 and the server device 160 are operated, and the other server devices are stopped. As in the embodiment, work loads are put together on a certain information processing apparatus in a range in which a shared resource is not overloaded, so that it is possible to stop or pause information processing apparatuses with no work loads allocated, and it is possible to aim the power savings of the information processing system.
It is noted that in the embodiment, the ID number 9001 ranges from “1” to “3”. However, even in the case where there are an ID number “4” and more, allocation can be similarly made as for the ID numbers “1” to “3”. In the embodiment, the embodiment is configured in which a single work load is disposed on a single virtual machine. However, a plurality of work loads may be disposed on a single virtual machine.
Number | Date | Country | Kind |
---|---|---|---|
2011-276621 | Dec 2011 | JP | national |