This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2014-265796, filed on Dec. 26, 2014, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a job allocation technique.
Due to high densification, high performance and other factors of servers, an amount of power consumed in a data center has been increasing, and the increase of the power consumption has led to an increase in heat generation. Temperature rise due to an increase in heat generation causes a thermal runaway or a failure of the server. This makes it difficult to operate the data center stably, and brings about adverse effects such as reduction of the service life of the server.
To solve the above problems, the air conditioner is provided in a place such as a server room where servers are arranged. However, high temperature air may stagnate in an area not easily reachable by a flow of air generated by the air conditioner (such an area is hereinafter referred to as a hot spot). This phenomenon is described with reference to
Since the servers A to H arranged in a location relatively close to the air conditioner are cooled by a strong air flow sent from the air conditioner, the hot spot hardly occurs around the servers A to H. However, the servers I to P arranged in a location relatively far from the air conditioner are hardly cooled since the wind sent from the air conditioner is weak. In the farthest location from the air conditioner, in particular, within the surroundings of the servers I to P, a hot spot is likely to occur because the air from the air conditioner is not easily reachable.
In this way, how easily the temperature rises varies among places where servers are installed. Also, how easily the temperature rises varies due to a difference in the processing load of resident basic software and so on. Since a temperature rise poses various problems as described above, jobs are preferably allocated so that a local temperature rise will not occur. However, even if a temperature rise is detected after the execution of a job is started, it is ill-advised to terminate the job easily, because the terminating of the ongoing job wastes an already executed portion of the job.
There is a technique of allocating tasks and the like based on temperature information. For example, a document discloses a technique that measures the temperature of each of multiple processors, and allocates tasks to the multiple processors based on the measured temperatures.
However, even a method of allocating tasks to processors currently at low temperature, for example, may cause a significant temperature rise, if a task of a high processing load is allocated to a processor in which the temperature tends to rise easily.
As an example of prior arts, there are known International Publication Pamphlet No. 2003/083693, Japanese Laid-open Patent Publication No. 2006-133995, Japanese Laid-open Patent Publication No. 2008-242614, and Japanese Laid-open Patent Publication No. 2010-108324.
According to an aspect of the invention, a method includes: generating first information for each of a plurality of jobs based on temperature information acquired from a first information processing device that has executed each of the plurality of jobs, each of the first information indicating change amount of the temperature of the first information processing device when each corresponding job is executed by the first information processing device; generating second information for each of a plurality of second information processing devices that have executed a specific one of the plurality of jobs, based on temperature information acquired from each of the second information processing devices, the second information indicating change amount of the temperature of each of the second information processing devices when the specific job is executed by each of the second information processing devices; and determining which one of the plurality of second information processing devices to allocate at least one of the plurality of jobs, based on the temperature information of the plurality of second information processing devices, the first information for the plurality of jobs, and the second information for the plurality of second information processing devices.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
According to one aspect, it is an object of embodiments to provide a technique for appropriately allocating a job based on a server temperature.
The preprocessing section 11 calculates a temperature rise (° C.) of the test server 3 for each of job nets based on the data stored in the first data storage section 14, and stores the calculation result into the second data storage section 15. Also, based on the data stored in the second data storage section 15, the preprocessing section 11 calculates, for each of execution servers, a temperature rise (° C.) by the execution of a job net (hereinafter referred to as the reference job net) selected based on a predetermined condition, and stores the calculation result into the second data storage section 15. The execution management section 12 executes such a processing of allocating a job to the execution servers A to P based on the data stored in the second data storage section 15, and stores the processing result into the third data storage section 16.
The feedback processing section 13 executes a processing based on the data stored in the third data storage section 16, and stores the processing result into the fourth data storage section 17.
Since temperature is different depending on the measurement place, a representative value representing the temperature (hereinafter referred to as the server temperature) of the test server 3 is defined. In this embodiment, the server temperature is calculated by s=αc+(1−α)h, where, s is the server temperature, c is the CPU temperature, and h is the HDD temperature. α is a real number satisfying 0<α<1 and predetermined by the administrator. However, the method of calculating the server temperature is not limited to the method described herein. Similarly, the execution servers A to P are not limited to the execution servers described herein.
As illustrated in
The hardware configuration of each of the execution servers A to P is the same as the hardware configuration of the test server 3.
Next, processings executed by the management device 1 are described with reference to
The preprocessing section 11 reads an execution schedule for prepositioning and job net data from the first data storage section 14 (
The preprocessing section 11 transmits an execution request of each of job nets to the test server 3 according to the execution schedule for prepositioning (S3). When the test server 3 does not have a batch file for job execution, the execution request includes a batch file for job execution. Assume that multiple job nets are not executed in parallel in the preprocessing. After a job net has been executed, execution of a next job may be started in a state where temperature of the test server 3 has dropped completely, by securing a sufficient time before the execution.
The test server 3 executes the job net. The temperature sensor 310 and the temperature sensor 330 in the test server 3 measure the CPU temperature and the HDD temperature at regular intervals (for example, every 1 second), and the test server 3 transmits data of measured CPU temperature and HDD temperature to the management device 1. The test server 3 may transmit data of the CPU temperature and data of the HDD temperature in a batch for each of job nets. Also, the test server 3 transmits execution result data including the start time and the end time of each job to the management device 1.
The preprocessing section 11 sequentially acquires data of the CPU temperature, data of the HDD temperature, and execution result data from the test server 3. Then, the preprocessing section 11 generates time series data of the CPU temperature and time series data of the HDD temperature (S5), and stores the time series data of the CPU temperature, the time series data of the HDD temperature, and the execution result data into the second data storage section 15.
Referring back to
Here, a processing of S7 is described. For example, assume that after the start of the execution of a job net (in this case, a job net J), the CPU temperature has risen as illustrated in
In the example of
Since temperature has risen by (1.0 to 0.6) ° C. for 10 second after elapse of 10 seconds from the start of the execution of the job net J, a formula 2 as illustrated below holds.
Since temperature has risen by (1.2 to 1.0) ° C. for 10 seconds after elapse of 20 seconds from the start of the execution of the job net J, a formula 3 as illustrated below holds.
Since temperature has risen by (1.3 to 1.2) ° C. for 10 seconds after elapse of 30 seconds from the start of the execution of the job net J, a formula 4 as illustrated below holds.
By solving the above simultaneous equations, CA, CB, CC, and CD can be determined. As a result, the preprocessing section 11 generates first CPU data as illustrated in
In the example of
Similarly, assume that when the execution of the job net J is started, the HDD temperature has risen as illustrated in
In the example of
Since temperature has risen by (1.1 to 0.8) ° C. for 10 second after elapse of 10 seconds from the start of the execution of the job net J, a formula 6 as illustrated below holds.
Since temperature has risen by (1.2 to 1.1) ° C. for 10 seconds after elapse of 20 seconds from the start of the execution of the job net J, a formula 7 as illustrated below holds.
Since temperature has risen by (1.3 to 1.2) ° C. for 10 seconds after elapse of 30 seconds from the start of execution of the job net J, a formula 8 as illustrated below holds.
By solving the above simultaneous equations, HA, HB, HC, and HD can be determined. As a result, the preprocessing section 11 generates first HDD data as illustrated in
Referring back to
The preprocessing section 11 calculates a temperature rise (° C.) for each of job nets from a temperature rise for each of jobs (S11). As a result of the processings up to S11, the preprocessing section 11 generates first temperature data as illustrated in
Then, the process ends.
By executing the processings as described above, a temperature rise by the execution of each of job nets can be estimated in advance on a job net basis. Since the job net is executed by one test server 3, variations in the temperature rise due to a difference in performance or installation place of the server may not occur.
Next, a processing of calculating a temperature rise for each of execution servers by the execution of the reference job net by the preprocessing section 11 of the management device 1 is described with reference to
First, the preprocessing section 11 generates second temperature data including the job net name and the temperature rise by the execution of the job net from the first temperature data stored in the second data storage section 15. Then, the preprocessing section 11 sorts entries of the second temperature data in the descending order using the temperature rise by the execution of the job net as a key (
Referring back to
The preprocessing section 11 transmits an execution request of the reference job net to each of the execution servers A to P (S25). When the execution servers A to P do not have a batch file for the reference job net, the execution request includes a batch file for the reference job net. In response to the execution request, each of the execution servers A to P executes the reference job net, and sequentially transmits data of the server temperature to the management device 1.
The preprocessing section 11 sequentially acquires data of the server temperature and execution result data from the execution servers A to P (S27), and stores in the second data storage section 15. A format of data of the server temperature stored in the second data storage section 15 is the same as the format illustrated in
The preprocessing section 11 calculates a temperature rise due to the reference job net for each of the execution servers A to P by calculating, based on data of the server temperature and the execution result data stored in the second data storage section 15, a difference between a server temperature at the execution start time of the reference job net and a server temperature at the execution end time of the reference job net (S29). Then, the preprocessing section 11 generates first server data including an identifier of the execution server and a temperature rise due to the reference job net, and stores in the second data storage section 15.
Referring back to
Thus, by causing each of execution servers to execute the reference job net, an index representing how easily the temperature rises is acquired.
Next, a processing executed when the execution management section 12 of the management device 1 actually allocates a job to the execution servers A to P is described with reference to
First, the execution management section 12 acquires data of the server temperature from the execution servers A to P (
In this step, the execution management section 12 generates second server data as illustrated in
Further, the execution management section 12 generates third server data as illustrated in
Referring back to
The execution management section 12 identifies execution servers within a predetermined number of ranks or in a predetermined or smaller number in ascending order of the predicted value (S42). Then, the execution management section 12 identifies an execution server having the smallest temperature rise (that is, the temperature is most unlikely to rise) due to the reference job net among the identified execution servers (S43). For example, if the third server data is data as illustrated in
If designated data is stored in the fourth data storage section 17, the execution management section 12 may determine in S43 whether the identified execution server satisfies a condition specified in the designated data. The designated data is described later.
The execution management section 12 transmits the execution request of a job net identified in S41 to the execution server identified in S43 (S45). When the execution server does not have a batch file for the job net, the execution request includes a batch file for the job net.
The execution management section 12 updates third server data stored in the third data storage section 16 (S47). Specifically, a temperature rise by the execution of a job net to be executed is added to a server temperature of the execution server identified in S43, as illustrated in
The execution management section 12 sorts entries of the third server data stored in the third data storage section 16 in the ascending order using the predicted value as a first key, and the temperature rise by the execution of the reference job net as a second key (S49). By the processing of S49, the third server data is updated as illustrated in
The execution management section 12 determines whether there is a job net to be executed next, based on an actual execution schedule stored in the first data storage section 14 (S51). If there is a job net to be executed next (route of S51: Yes), the process returns to S41.
If there is no job net to be executed next (route of S51: No), the execution server 12 acquires data of the server temperature from each execution server when the execution of a job net therein ends (S53). Then, the execution management section 12 generates fourth server data as illustrated in
Then, the process ends.
By executing such processings, the job is unlikely to be allocated to an execution server whose present temperature is relatively high, or whose temperature is likely to rise. As a result, there is a less likelihood that temperature of an execution server significantly rises compared with temperature of other execution servers. That is, temperatures of the execution servers A to P may be leveled.
Thus, the execution servers A to P can be operated stably in the server room 5, and service life of the execution servers A to P can be prolonged.
Since occurrence of the hot spot can be suppressed, power consumption by the air conditioner 50 can be reduced by lowering the set temperature of the air conditioner 50.
Next, processings executed by the feedback processing section 13 are described with reference to
First, the feedback processing section 13 calculates the difference between the predicted value and the measured value for each of execution servers based on the fourth server data stored in the third data storage section 16 (
The feedback processing section 13 sorts entries of the fifth server data in the descending order using the difference between the predicted value and the measured value as a key (S63). By sorting entries of the fifth server data illustrated in
The feedback processing section 13 outputs fifth server data, for example, to a display device of the management device 1 (S65). From this data, the administrator can easily identify an execution server having a large divergence between the predicted value and the measured value.
The feedback processing section 13 receives, from the administrator, designation of a job net to be executed by the execution server, and designation of a job net not to be executed by the execution server (S67), and generates designated data. Then, the feedback processing section 13 stores the designated data into the fourth data storage section 17.
Then, the process ends.
One of the reasons why the difference between predicted and measured values of the temperature is large is that execution of a job by an execution server having a high processing load takes a prolonged time period. As an example, execution of a job net including a job A, a job B, and a job C is considered. A change of the server temperature when the execution of each of jobs ends without taking a prolonged time period is illustrated in
Here, an example of the execution of the job A prolonged longer than normal is illustrated in
The administrator identifies a job net causing a divergence between the predicted value and the measured value by referring to the outputted execution history of the fifth server data and execution servers, and so on. For example, if an execution server executing a job net (in this case, a job net “netK”) including the job A is an execution server “svM” in the example of
By executing the processings described above, the management device 1 can allocate a job based on the designated data from a next time and thereafter. Thus, the server temperatures of the execution servers A to P may be further leveled. For example, the fifth server data as illustrated in
In the second embodiment, a method of determining the temperature rise more precisely is described.
For example, assume that the CPU temperature has risen by 0.6° C. by executing a job A for 20 seconds. According to the first embodiment, when no other job is executed during the execution of the job A, the temperature rise by the execution of the job A is 0.6° C.
Here, assume that the CPU use rate of the job A is 20% for 10 seconds which are of a first half of 20 seconds, and the CPU use rate of the job A for 10 seconds of the second half is 10%. In such a case, a temperature rise for 10 seconds of the first half and a temperature rise for 10 seconds of the second half are considered to be different. Specifically, the temperature rise for 10 seconds of the first half is 0.6*(20/(20+10))=0.4° C., and the temperature rise for 10 seconds of the second half is 0.6*(10/(20+10))=0.2° C.
By considering the CPU use rate as described above, the CPU temperature rise may be calculated more precisely. Similarly, by considering an input/output (IO) use rate, temperature rise of the HDD may be calculated more precisely.
In the second embodiment, the CPU temperature rise and the HDD temperature rise are calculated by using the CPU use rate and the IO use rate.
Specifically, the preprocessing section 11 acquires data of the CPU use rate and data of the IO use rate in addition to data of the CPU temperature and data of the HDD temperature in S5. Then, the preprocessing section 11 generates time series data of the CPU use rate and time series data of the IO use rate, and stores in the second data storage section 15. The CPU use rate and the IO use rate may be acquired by a function of an operating system (OS) of the test server 3.
Then, in S7, simultaneous equations are generated by further using the CPU use rate. For example, in the example illustrated in
For 10 seconds after the start time, since temperature has risen by (0.6-0.0) ° C., a formula 9 as illustrated below holds. In the formula, CA represents the temperature rise of the CPU by the execution of the job A, CB represents the temperature rise of the CPU by the execution of the job B, CC represents the temperature rise of the CPU by the execution of the job C, and CD represents the temperature rise of the CPU by the execution of the job D. UX,k represents the CPU use rate of a job X after k seconds from the start of the job X. Assume that the duration of temperature drop by exhaust heat illustrated in
Since temperature has risen by (1.0 to 0.6) ° C. for 10 second after elapse of 10 seconds from the start of the execution of the job net J, a formula 10 as illustrated below holds.
Since temperature has risen by (1.2 to 1.0) ° C. for 10 seconds after elapse of 20 seconds from the start of the execution of the job net J, a formula 11 as illustrated below holds.
Since temperature has risen by (1.3 to 1.2) ° C. for 10 seconds after elapse of 30 seconds from the start of the execution of the job net J, a formula 12 as illustrated below holds.
By solving the above simultaneous equations, CA, CB, CC, and CD can be determined. Description of the HDD temperature rise is omitted since it may be determined by performing similar calculations using data of the IO use rate.
By performing the processings as described above, the CPU temperature rise and the HDD temperature rise may be calculated more precisely.
In the third embodiment, the other example of the processing executed by the execution management section 12 is described.
First, the execution management section 12 acquires data of the server temperature from the execution servers A to P (
The execution management section 12 identifies one job net to be executed, based on an actual execution schedule stored in the first data storage section 14 (S71). This processing is the same as the processing of S41.
From the third server data, the execution management section 12 identifies execution servers each having a predicted value different by a predetermined value or less from the predicted value of the execution server having the lowest predicted value (S72). Then, the execution management section 12 identifies an execution server having the smallest temperature rise among the identified execution servers (S73). Description of subsequent processings is omitted as the processings are the same as the processings in S45 to S53.
Even with the processings described above, the temperatures of the execution servers A to P may be leveled.
Although the embodiments of the present disclosure are described above, the present disclosure is not limited to them. For example, the management device 1 described above may have a functional block configuration different from a modular configuration of an actual program.
Also, the configuration of each table described above is just an example, and is not be limited thereto. Further, also in a processing flow, the sequence of processings may be replaced if there is no change in the processing result. Further, processings may be executed in parallel.
Values stored in each table are just an example, and are not limited thereto. Also, predicted values and measured values illustrated above do not represent actual operation results.
Although the server temperature is used when allocating the job in the embodiment described above, a job execution status (for example, processing load or queue status) may be used in place of the server temperature.
Although temperature drop due to the exhaust heat is not considered when calculating the temperature rise of the CPU and the temperature rise of the HDD in the embodiment described above, the temperature rise of the CPU and the temperature rise of the HDD may be calculated in consideration of temperature drop due to the exhaust heat, for example, by introducing a term of the exhaust heat into the formula.
The management device 1 described above is a computer device, which is coupled a memory 2501, a central processing unit (CPU) 2503, a hard disk drive (HDD) 2505, a display controller 2507 coupled to a display device 2509, a drive device 2513 for a removable disk 2511, an input device 2515, and a communication controller 2517 for coupling to a network via a bus 2519, as illustrated in
The embodiments of the present disclosure described above are summarized below.
A method of allocating a job according to the embodiments comprises: (A) generating information of a first change amount for each of a plurality of jobs based on temperature information of a first information processing device that have executed the plurality of jobs, the first change amount being an amount of temperature changed when each job is executed, the temperature information acquired from the first information processing device; (B) generating information of a second change amount for each of a plurality of second information processing devices that has executed a specific one of the plurality of jobs, based on temperature information of the each second information processing device acquired from the each second information processing device, the second change amount being an amount of temperature changed when the specific job is executed; and (C) determining which one of the plurality of second information processing devices each of the plurality of jobs is to allocated to, based on the temperature information of the plurality of second information processing devices, the information of the first change amounts for the plurality of jobs, and information of the second change amounts for the plurality of second information processing devices.
The second change amount generated as described above indicates how easily the temperature of the second information processing device changes. This provides a countermeasure of avoiding job allocation to a second information processing device in which the temperature tends to change. Therefore, the jobs may be allocated appropriately based on the temperatures.
In the above step of determining which one of the plurality of second information processing devices each of the plurality of jobs is to allocated to, (c1) for each of the plurality of jobs, second information processing devices within a predetermined number of ranks or in a predetermined or smaller number in ascending order of the temperature of the second information processing device may be identified among the plurality of second information processing devices; and a second information processing device to which the job is to be allocated may be determined based on the second change amounts among the identified second information processing devices. In this way, jobs may be allocated such that the temperatures of the plurality of second information processing devices can be leveled.
In the above step of determining which one of the plurality of second information processing devices each of the plurality of jobs is to allocated to, (c2) for each of the plurality of jobs, second information processing devices each having a temperature different by a predetermined value or less from a temperature of a second information processing device that has the lowest temperature may be identified among the plurality of second information processing devices; and a second information processing device to which the job is to be allocated may be determined based on the second change amounts among the identified second information processing devices. In this way, jobs may be allocated such that the temperatures of the plurality of second information processing devices are leveled.
The job allocation method may further include (D) a step of, for each of the plurality of jobs, updating the temperature information of the second information processing device to which the job is to be allocated, such that the temperature information indicates a value obtained by adding the first change amount for the job to the temperature of the second information processing device to which the job is to be allocated. In this way, a job allocation may be performed in consideration of temperature changes due to already allocated jobs.
The job allocation method may further include a step of acquiring information of a start time and an end time for each of the plurality of jobs from the first information processing device. In the above step of generating a first change amount for each of the plurality of jobs, (a1) the first change amount may be generated for each of the plurality of jobs by calculating a difference between a temperature of the first information processing device at the start time of the job and a temperature of the first information processing device at the end time of the job. Thus, the information of the first change amount may be generated appropriately.
The temperature information of the first information processing device may be information generated based on temperature information of a processor in the first information processing device and temperature information of a storage device in the first information processing device. Since the temperatures of the processor and the storage device change particularly when a job is executed, the temperature information may be generated more appropriately by using the above method.
The job allocation method may further include (E) a step of acquiring information of a use rate of a processor and information of IO use rates for a storage device from the first information processing device. In the above step of generating the information of the first change amount for each of the plurality of jobs, (a2) the information of the first change amount which is an amount of temperature changed when a job is executed may be generated for each of the plurality of jobs based on the temperature information of the first information processing device, the information of the use rate of the processor, and the information of the input and output use rates for a storage device. Thus, information of the first change amount may be generated more precisely.
The job allocation method may further include (F) a step of acquiring the temperature information of the second information processing devices from the plurality of second information processing devices, when execution of the plurality of jobs completes; and (G) generating information of a difference between the temperature in the acquired temperature information for each of the plurality of second information processing devices, and the temperature in the updated temperature information of the second information processing device; and outputting the generated information of the difference. Thus, the difference between the calculated temperature and the actual temperature may be checked by an administrator or the like.
A program for causing a computer to execute the above processings may be created, and the program may be stored, for example, in a computer readable storage medium or storage device such as a flexible disk, an optical disk such as CD-ROM, an optical magnetic disk, a semiconductor memory (for example, ROM), and a hard disk. Intermediate processing results may be temporarily stored in a storage device such as a memory.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2014-265796 | Dec 2014 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
7461273 | Moore | Dec 2008 | B2 |
9135063 | Ghose | Sep 2015 | B1 |
9152472 | Kazama | Oct 2015 | B2 |
20050278520 | Hirai et al. | Dec 2005 | A1 |
20060095911 | Uemura et al. | May 2006 | A1 |
Number | Date | Country |
---|---|---|
2006-133995 | May 2006 | JP |
2008-242614 | Oct 2008 | JP |
2010-108324 | May 2010 | JP |
WO 03083693 | Oct 2003 | WO |
Number | Date | Country | |
---|---|---|---|
20160187018 A1 | Jun 2016 | US |