1. Field of the Invention
The present invention relates to the technology for appropriately selecting a server to execute a batch job and for efficiently distributing the load in a multiple server environment where a plurality of servers executing a batch job are present.
2. Description of the Related Art
There has been a previous method to improve throughput by distributing a plurality of batch jobs across a plurality of servers, causing the servers to execute the distributed batch jobs. It is possible to determine the distribution statically; however, dynamic distribution can achieve a further efficient load distribution.
A system described in Patent Document 1 monitors load statuses of a plurality of servers executing batch jobs. When the batch job execution is requested, the system classifies the batch job into types (such as “CPU resource using type”, a type using CPU resources inmain rather thanmemory and I/O resources) based on a preset resource usage characteristic of the batch job, and selects a server in a load status appropriate for executing that type of job. A similar system is disclosed in Patent Document 2.
In a batch job system, unlike an online job system, batches of input data are processed together. Therefore, the batch job has a characteristic such that if one batch job is executed multiple times with each different input data volume, the amount of the used computer resources and the execution time depend on the input data volume (the number of transactions).
In many cases, to process a large input data volume, execution of a batch job requires a long time, for example one to two hours. Thus, there is a high probability that a server with a low load when the batch job started may have a high load while executing the batch job due to various factors, including factors other than the batch job. If the system causes such a server with a low load to execute the batch job based on the server load status at the start of the batch job, the optimal distribution cannot be achieved.
The systems described in Patent Document 1 and Patent Document 2, however, do not take into account the time factor required for batch job execution. Additionally, the server load status used to determine the batch job distribution is only the load status obtained immediately before/after the batch job execution request.
In the systems of Patent Document 1 and Patent Document 2, it is crucial to obtain the batch job characteristics properly. However, due to the amount of time and effort required, the conventional systems have difficulties obtaining batch job characteristics itself. Because there is no standard system or tool to visualize factors of batch job process time, such as the process data volume, the user resource conflict, the system resource conflict, waiting time occurring as a result of the conflict, and others in a comprehensive manner, a user needs to develop an application program on his/her own in order to obtain the batch job characteristics. The second reason is that although a server comprises a standard function to calculate the system loads for each process, the calculation for each batch job requires manual effort, or a user needs to create a specific application program.
Patent Document 1: Japanese Patent Application Publication No. 10-334057
Patent Document 2: Japanese Patent Application Publication No. 4-34640
It is an object of the present invention to select an optimal server over a period of time required for the execution of batch jobs, in selecting a server to execute the batch job in a multiple server environment where a plurality of servers executing the batch jobs are present. It is another object of the present invention to reduce the difficulties of obtaining the batch job characteristics by automatically recording the batch job characteristics used in the selection.
The program according to the present invention is used in a batch job receiving computer for selecting a computer (i.e. server) to execute a batch job from a plurality of computers. The program according to the present invention causes the batch job receiving computer to predict the execution time required for the execution of the batch job based on a characteristic of the batch job and input data volume provided to the batch job. The batch job receiving computer also predicts each of the load statuses of a plurality of the computers in a time range with a scheduled batch job execution start time as a starting point and with a predicted execution time period. It additionally causes the batch job receiving computer to select a computer to execute the batch job from a plurality of the computers based on the predicted load status.
Preferably, the program according to the present invention further causes the batch job receiving computer to update the batch job characteristic based on information relating to a load that occurs when the batch job is executed by the above selected computer.
According to the present invention, a server load status not at a point in time but over a time period is predicted and a server to execute the batch job is selected based on the prediction. The time period is determined by predicting a required time for the batch job execution. Therefore, it is possible to select an appropriate server in executing a batch job that requires a long execution time, even in an environment where the load status of a plurality of servers changes according to a time period. Consequently, it is possible to distribute batch jobs more efficiently than in the past in a multiple server environment.
Because the batch job characteristics are generated and updated automatically, potential problems, such as effort by a system administrator etc. to obtain batch job characteristics, can be reduced. Furthermore, the reliability of the recorded batch job characteristics is enhanced as the collected volume of the data representing the batch job characteristics increase. Therefore, the accuracy of the server selection determination to execute the batch job can be improved, realizing a further efficient operation.
In the following description, details of the embodiments of the present invention are set forth with reference to the drawings.
In addition, for every batch job execution by the selected server, the program measures and records the load resulting from the execution and updates the batch job characteristics based on the recorded data.
In the following description, first, the outline of a method for selecting a server executing a batch job is explained referencing to
Suppose that there is a batch job scheduled to be started from a time t1. As in the example of
For the purpose of simplifying the explanation, this description assumes that the differences between the hardware performances of server A and the server B is negligible. Then, the predicted time required to execute a batch job in server A is also the predicted time required to execute the batch job in server B (the prediction method is explained later). The predicted time is designated as d, and a time t2 is a time defined as t2=t1+d. The range between time t1 and time t2 is a predicted time range from the execution start to the execution completion of the batch job. The predicted time range is hereinafter referred to as the batch job execution range. The present invention takes into account each load of the server A and the server B in the batch job execution range and selects a server to execute the batch job. In the example of
It should be noted that in the graph of
If the load generated by the batch job execution significantly changes (increases or decreases) within the execution range, matching the trend of the change and a trend of the server load change in the execution range needs to be considered when selecting a server to execute the batch job. In practice, however, the load caused by one batch job execution does not change significantly in many cases (
The receiving server 102 is a server computer that has a function to schedule the batch job (hereinafter referred to as “scheduling function”). Each of the execution servers 103-1, 103-2, . . . , 103-N is a server computer that has a function to execute the batch job (hereinafter referred to as “execution function”). In the following description, it is mainly assumed that the difference in performances of the execution servers 103-1, 103-2, . . . , 103-N is negligible. An example of such a situation is a case where execution servers with the similar performance are managed as clustered servers. The repository 104 is provided on a disk device (storage device), storing various data (
The scheduling function is present in one physical server (that is the receiving server 102). The execution function is present in more than one physical server (that is the execution servers 103-1, 103-2, . . . , 103-N). The receiving server 102 may be physically identical with one of the servers of the execution server group 103, or may be different from any of the servers in the execution server group 103.
The format of the disk device provided with the repository 104 has to be a format that can be referred by each server (the receiving server 102 and the execution servers 103-1, 103-2, . . . , 103-N) using the repository 104. However, the format does not have to be versatile, but can be a format unique to the batch system 101.
The repository 104 stores data indicating the system operation state (hereinafter referred to as “operation data”), data indicating characteristics of the batch job (hereinafter referred to as “batch job characteristics”), data indicating the server load status (hereinafter referred to as “server load information”), and rules for selecting an execution server to execute the batch job (hereinafter referred to as “distribution conditions”).
Each of the above information in the repository 104 may be stored in a file or may be stored in a plurality of separate files. The disk device provided with the repository 104 can be a disk device physically different from any of the local disks of the receiving server 102 and the execution servers 103-1, 103-2, . . . , 103-N, or can be a disk device physically identical with the local disk of any of the servers. It is also possible that the repository 104 is physically divided into and provided to more than one disk devices. For example, the batch job characteristics and the distribution conditions may be stored in a local disk of the receiving server 102, and the operation data and the server load information may be stored in a disk device that is physically different from any of the server local disk.
The operation data is data for managing the history of batch job execution and the history of the server load. An example of the operation data is shown in
The batch job characteristics are generated by extracting the data for each batch job from the operation data shown in
The server load information is information managing the load of each of the execution servers 103-1, 103-2, . . . , 103-N for each period of time. The repository 104 may store items such as the amount of CPU usage, the CPU utilization, the amount of memory usage, the memory utilization, the average waiting time of physical I/O, the amount of file usage, and free space of a storage device as the server load information. Among the above items, the necessary items are stored in the repository 104 as the server load information depending on the embodiment. An example of server load information is shown in
The distribution conditions hold rules referred to when selecting a server to execute a batch job.
The receiving server 102 includes four subsystems of a job receiving subsystem 105 for receiving the batch job execution request, a job distribution subsystem 106 for selecting an execution server to execute the job, an operation data extraction subsystem 107 for recording the operation data, and a job information update subsystem 108 for updating the batch job characteristics. These four subsystems are linked with each other.
Each of the execution servers 103-1, 103-2, . . . , 103-N include four subsystems of a job execution subsystem 109 for executing the batch job, an operation data extraction subsystem 110 for recording the operation data, a performance information collection subsystem 111 for collecting server load information, and a server information extraction subsystem 112 for updating the contents of the repository 104 based on the collected server load information. These four subsystems are linked with each other.
Each of the receiving server 102 and the execution servers 103-1, 103-2, . . . , 103-N have four subsystems and the four subsystems may be realized by four independent programs operating in coordination or may be realized by one program comprising the four functions. Alternatively, a person skilled in the art can implement the subsystems in various embodiments such as combining two or three functions into one program or realizing one function by a plurality of linked programs. Details of contents of processes performed by four subsystems are to be hereinafter described.
The
In the example of
The content of a second record has “2006/02/01 10:00:00.050” in the storage date and time and “20” (a code indicating the prediction of an execution time and load of the batch job) in the record type, “JOB 1” in the data item 1, “1000” (the number of transactions i.e. the number of input data of JOB 1) in data item 2, “3300 seconds” (the predicted time required for execution of JOB 1) in the data item 3, “600.0 seconds” (the predicted amount of CPU usage or the predicted CPU occupancy time required for execution of JOB 1) in the data item 4, “9%” (the predicted CPU utilization to be increased by the execution of JOB 1) in the data item 5, and “4.5 MB” (the predicted amount of memory usage used by JOB 1) in the data item 6. The record indicates that the prediction of the time and load required for the execution of JOB 1 was recorded at 2006/02/01 10:00:00.050 and the contents of the prediction are recorded in the data item 2 and the following columns. Although the data item 7 and the following columns are not shown in the drawings, the necessary items are predicted depending on the embodiment, and the prediction result is stored. In the operation data, a record with the record type being “20” is hereinafter referred to as “job execution prediction data”.
The time required for the execution of the batch job and the CPU utilization have different predicted values depending on the execution server. In the drawings, however, the differences between each execution server are not shown. For example, if the difference in hardware of the execution servers 103-1, 103-2, . . . , 103-N is negligible, it is sufficient to record one CPU utilization in one data item. Meanwhile, if the hardware performance of each of the execution servers 103-1, 103-2, 103-N is so different that it is not negligible, the CPU utilization for each execution server is predicted, for example, and each predicted value may be stored in separate columns.
Alternatively, one CPU utilization is recorded as a reference, and the CPU utilization in each of the execution servers 103-1, 103-2, . . . , 103-N may be converted from the reference by a prescribed method.
The content of a third record has “2006/02/0110:55:30.010” in the storage date and time, “30” (a code indicating the end of the batch job execution) in the record type, “JOB 1” in the data item 1, “582.0 seconds” (the actual measurement of the amount of CPU used by JOB 1) in the data item 2, “10%” (the actual measurement of CPU utilization increased by JOB 1) in the data item 3, “4.3 MB” (the actual measurement of the amount of memory used by JOB 1) in the data item 4, “5%” (the actual measurement of the fraction of memory used by JOB 1) in the data item 5, and “16000” (the number of physical I/O generated by JOB 1) in the data item 6. The record indicates that the end of the execution of JOB 1 was recorded at 2006/02/01 10:55:30.010, and the actual measurements of the load required for the execution are recorded in the data item 2 and the following columns. Although the data item 7 and the following columns are not shown, the necessary items are measured depending on the embodiment, and the actual measurement is recorded. In the operation data, a record with the record type being “30” is hereinafter referred to as “job actual data”.
The content of a forth record has “2006/02/0110:55:30.100” in the storage date and time, “90” (a code indicating the end of the whole series of processes relating to the batch job) in the record type, and “JOB 1” in the data item 1. The column of the data item 2 and the following are not used. The record indicates that the end of the whole series of processes relating to JOB 1 was recorded at 2006/02/0110:55:30.100. In the operation data, a record with the record type being “90” is hereinafter referred to as “job end data”.
It should be noted that the operation data is not limited to the above four types, but an arbitrary type can be added depending on the embodiment. For example, data corresponding to the server load information shown in
The
The example of
The input data volume (the number of input records), the amount of CPU usage, the CPU utilization, the amount of memory usage, the memory utilization, the number of the physical I/O issues, the amount of file usage, the number of used files, the file occupancy time, the user resource conflict, the system resource conflict, the waiting time when the conflict occurs and others can be used as the data type of the batch job characteristics. In accordance with the embodiment, the necessary data type can be used as the batch job characteristics.
Note that
The items shown in
If the difference in the hardware performance of the execution servers 103-1, 103-2, . . . , 103-N is not negligible, in some cases the batch job characteristics of some data types should be recorded for each execution server. For example, because the execution time and the CPU utilization etc. are influenced by the hardware performance of the execution server, these items of the batch job characteristics are desirable to be recorded for each execution server in some cases. On the other hand, because the amount of memory usage and the number of physical I/O issues etc. are not normally influenced by the hardware performance of the execution server, these items of the batch job characteristics does not need to be recorded for each execution server.
The premise of the example of
Based on the above premise, the server load information is measured and recorded every 10 minutes everyday from 00:00 to 23:50, for example. Because of the premise that the load at a certain time of a day is approximately the same amount at the same time of any day, the process overwrites the record of the same time of the previous day. The data at the latest measurement time, additionally, is recorded separately as a special “latest state” data. In other words, in each of the execution servers 103-1, 103-2, . . . , 103-N, 145 data blocks ((60÷10)×24+1=145) are recorded (The data block hereinafter indicates a plurality of rows grouped for every value of the extraction time period shown as in
As described above, the server load information is recorded at a specific time point. Because the server load status at a specific time point can be considered as representative of that of a certain time period, the recorded server load information can be considered as a representation of the certain period. For example, the server load information recorded every 10 minutes can be considered as a representation of the load status of 10-minute period. Therefore, the server load information may have an item of “extraction time period”.
In the following description, the individual data recorded as above is explained using an example of a 00:10 block in
The measurement and record can be performed in an interval other than 10 minutes depending on the embodiment. In practice, there are batch jobs executed in other cycles such as a weekly period operation and a monthly period operation. Therefore, the extraction date and time rather than the extraction time period (extraction time) may be recorded. In such a case, it is favorable to accumulate the appropriate amount of the server load information in accordance with the period rather than accumulating the server load information of a nearest day (i.e. 24 hours) alone as in the above example. For example, it is desirable that when the batch system 101 is influenced by each period of the monthly operation, weekly operation, and daily operation, the server load information for one month, which is the longest period, is accumulated, and the block at the same time in the previous month is overwritten. Note that an appropriate period varies depending on the embodiment; however, in general, since a number of batch jobs are executed regularly, the load status of the execution servers has periodicity to a certain extent.
The items shown in
It should be noted that the distribution conditions can be represented by an arbitrary format other than the one shown in
In step S101, the job receiving subsystem 105 of the receiving server 102 receives a batch job execution request. The batch job in the flowchart of
In step S102 for the current batch job, the job receiving subsystem 105 requests the operation data extraction subsystem 107 to add the job start data (
In step S103, the operation data extraction subsystem 107 adds the job start data to the operation data. In other words, the job start data is recorded in the operation data in the repository 104. Afterwards, the process proceeds to step S104. Step S103 corresponds to (3) of
In step S104, the job receiving subsystem 105 requests the job distribution subsystem 106 to select an execution server executing the current batch job from the execution server group 103 and to cause the selected execution server to execute the current batch job. Afterwards, the process proceeds to step S105. Step S104 corresponds to (4) of
In step S105, the job distribution subsystem 106 predicts the time required for the execution of the current batch job and determines an optimal execution server within the predicted time. Here, assume that the execution server 103-s is selected (1≦s≦N). Details of the process in step S105 are explained in combination with
In step S106, the job distribution subsystem 106 requests the current batch job execution to the job execution subsystem 109 in the execution server 103-s. Here, communication between the receiving server 102 and the execution server 103-s is performed. Afterwards, the process proceeds to step S107. Step S106 corresponds to (6) of
In step S107, the job execution subsystem 109 in the execution server 103-s requests the performance information collection subsystem 111 in the execution server 103-s to record data corresponding to the batch job characteristics data of the current batch job. Specifically, the job execution subsystem 109 requests to measure and record the data values of the data items (e.g. the amount of memory usage) included in the job actual data of the operation data (
In step S108, the performance information collection subsystem 111 requests the operation data extraction subsystem 110 to record the job actual data based on the load status monitored by the performance information collection subsystem 111, and then provides the monitored data to the operation data extraction subsystem 110. Based on the request, the operation data extraction subsystem 110 adds (or records) the job actual data to the operation data in the repository 104. The process proceeds to step S109. Step S108 corresponds to (8) of
In step S109, the job execution subsystem 109 notifies the job receiving subsystem 105 of the end of the execution of the current batch job. In this step, like step S106, communication is shared between the receiving server 102 and the execution server 103-s. Afterwards, the process proceeds to step S110. Step S109 corresponds to (9) of
In step S110, for the current batch job, based on the notification, the job receiving subsystem 105 requests the operation data extraction subsystem 107 to add the job end data (
In step S111, the job receiving subsystem 105 requests that the job information update subsystem 108 updates the batch job characteristics in the repository 104. The process proceeds to step S112. Step S111 corresponds to (11) of
In step S112, the job information update subsystem 108 updates the batch job characteristics of the current catch job. In other words, the storage content of the repository 104 is updated. The update is performed based on the job actual data recorded in step S108, and the details are described later. After the execution of step S112, the process ends. Step S112 corresponds to (12) of
The parameters used in
In step S201, the repository 104 is searched to determine whether or not the batch job characteristics (
In step S202, based on the result determined in step S201, it is determined whether the batch job characteristics corresponding to the current batch job are present or absent. If they are present, the determination is Yes, and the process moves to step S203. If they are absent, the determination is No, and the process moves to step S214.
In step S203, the input data volume of the current batch job is obtained. Based on the input data volume and the batch job characteristics stored in step S201, the time required for the current batch job execution is predicted. The input data volume can be represented by the number of transactions, for example, or may be represented by volume on the basis of a plurality of factors, such as the number of transactions and the number of data items included in one transaction. For example, if input data is provided in a form of a text file and input data of one transaction is written in one line, the number of lines of the text file is obtained and can be used as the input data volume.
For example, in the example of batch job characteristics of
In step S204; 0 is assigned to the subscript j designating the execution server for initialization. The process then proceeds to step S205.
An iteration loop is formed by each step from step S205 to step S211. In step S205, 1 is added to j, first, and the execution server 103-j is selected as server load prediction target. The process proceeds to step S206.
In step S206, in the server load information (
In an embodiment with a different period of server load status change, appropriate data in accordance with the period is loaded. For example, in a case of the monthly period, the server load information is accumulated for one month and the server load information of the blocks of the time within the time range from t1 to t2 of the day of the previous month is loaded. When the necessary data is loaded, the process proceeds to step S207.
In step S207, the mean value of the load of the execution server 103-j in the execution range of the current batch job is calculated for each server load information data type. The mean value calculated on the k-th data type in L data types is assigned as Mjk and is stored in the memory etc. of the receiving server 102. As described in the explanation of
The server load's mean value calculated in step S207 is the mean value in the current batch job's execution range. This fact is the feature of the present invention. By having this feature, compared with the conventional systems, the further appropriate selection of the batch job execution server can be performed and the distribution efficiency can be improved. In other words, by considering the load status over the execution range of the current batch job rather than by considering the server load status immediately prior to the execution of the batch job alone as in the conventional systems, further appropriate selection can be achieved. Because the range for calculation of the mean value Mjk is a specific time range, which is the execution range of the current batch job, compared with the load status mean value within a roughly defined range unrelated to the current batch job execution range, such as the load status mean value for every month, for example, Mjk is an accurate predicted value.
Note that in the example of
When the mean values Mjk are calculated for all k where 1≦k≦L in step S207, the process proceeds to step S208. Step S208 to step S210 are steps used for the further accurate determination of an optimal execution server in designating the future time close to when the process of
In step S208, for each server load information data type, the difference Djk between the mean value Mjk and the data value Sjk of the server load information at the time t1 is calculated. It can also be represented as Djk=Mjk−Sjk. It should be noted that because the server load information is recorded at a certain interval, data from same time as time t1 is not necessarily present. In such a case, Sjk can be calculated by interpolation of the interval between the server load information before the time t1 and after the time t1, or can be substituted by the server load information at the time immediately before or immediately after the time t1. When the difference Djk for all k where 1≦k≦L is calculated, the process proceeds to step S209.
In step S209, for all k where 1≦k≦L, Djk is added to the data value Cjk of the k-th data type of the server load information in the block of the latest state loaded in step S206 to calculate Ajk. Ajk corresponds to the value, which is Mjk corrected in order to improve the reliability. The reason for the improvement is provided below.
As clear from the operations in step S206 through step S208, Mjk and Sjk are values calculated based on the data in the past. The present invention premises that the load status of the execution server has periodicity and the future load status can be predicted from the load information in the past by using the periodicity. However, the prediction has errors. Meanwhile, since Cjk is the latest actual measurement, the information is highly reliable. As above, t1 is the time close to the point in time the process of
For example, in a case as in
After calculating Ajk for all k where 1≦k≦L in step S209, the process proceeds to step S210.
In step S210, for all k where 1≦k≦L, the load Xjk caused by the execution of the current batch job is predicted using the batch job characteristics of the current batch job. The batch job characteristics of the current batch job have already been stored in the memory etc. in step S201. The load status of the execution server 103-j in the execution range of the current batch job when executing the current batch job is predicted for all k where 1≦k≦L, based on the Xjk and Ajk. The predicted value is stored as Yjk.
For example, in the example of the batch job characteristics of
In addition, if the difference in hardware performance of the execution servers 103-1, 103-2, . . . , 103-N is negligible, the values Xjk for all j where 1≦j≦N are considered to be equal. In such a case, Xjk does not have to be calculated every time the process in step S210 is executed in the iteration loop from step S205 to step S211. Xjk where j=1 (=Xlk) alone should be calculated and the calculated and stored Xlk can be used as Xjk where j>1.
When Yjk is calculated for all k where 1≦k≦L in step S210, the process proceeds to step S211.
In step S211, it is determined if the load status in the execution range of the current batch job when executing the current batch job is calculated for all execution servers. In other words, it is determined if j=N or not. If the calculation has been performed for all execution servers (j=N), the determination is Yes and the process proceeds to step S212. If not (j<N), the determination is No and the process returns to step S205. Note that it is obvious from steps S204, S205, and S211 that j>N cannot occur.
In step S212, the execution server of the current batch job is determined according to Yjk calculated in step S210 and the distribution conditions stored in the repository 104. When the distribution conditions are the same as
In step S213, the job distribution subsystem 106 causes the operation data extraction subsystem 107 to add the job execution prediction data to the operation data (
If the determination is No in step S202, the process moves to step S214. Step S214 through step S216 are steps for exceptional processes. In regards to the server load information (
In step S214, 0 is assigned to the subscript j designating the execution server for initialization. The process proceeds to step S215.
An iteration loop is formed by each step from step S215 to step S216. In step S215, 1 is added to j first. Then among the server load information stored in the repository 104, the data of the “latest state” block of the execution server 103-j is loaded. The data value corresponding to k-th data type of the execution server 103-j is designated as Yjk and is stored in the memory etc. of the receiving server 102. After Yjk for all k where 1≦k≦L are stored, the process moves on to step 216.
In step S216, it is determined whether or not the server load information of the “latest state” blocks of all execution servers is loaded. In other words, it is determined if j=N or not. If the server load information for all execution servers has been loaded (j=N), the determination is Yes, and the process moves to step S212. If not (j<N), the determination is No, and the process returns to step S215. Note that j>N cannot occur.
As described above, in step S212, the execution server is selected in accordance with the distribution conditions. In other words, the process of moving from step S216 to step S212 is the same as the conventional methods so that the execution server of the batch job is selected based on the load status, which is close to the point in time when the batch job execution request is issued, alone.
As is clear from the descriptions on
For example, an execution server with the predicted execution time longer than a prescribed threshold may be excluded, or the predicted execution time is compared among those in the execution server group 103 and the execution server excluded may be determined from the relative order etc. Alternatively, a condition regarding the execution time may be included in the distribution conditions used in step S212.
In step S301, from the operation data (
In step S302, the current batch job's process time is calculated using the difference between the storage date and time of the job end data and that of the job start data. Afterwards, the process time per transaction T is calculated and the process proceeds to step S303. Depending on the embodiment, process time or T of the current batch job is recorded in the job actual data so that it may be loaded in step S302. T may be calculated by dividing the difference between the storage date and time of the job end data and that of the job start data by the number of transactions. Alternatively, other methods can be employed to calculate T (in a case of, for example, the batch job including a process, which requires a certain time period regardless of the number of input data).
In step S303, among the data items of the job actual data loaded in step S301, the data value per transaction is calculated for items to be recorded as the batch job characteristics. When the number of data types to be recorded as the batch job characteristics is designated as B, for all i where 1≦i≦B, a data value per transaction Ci is calculated based on the data value in the job actual data corresponding to the i-th data type and the number of transactions. Ci can be obtained by dividing the data value in the job actual data corresponding to the i-th data type by the number of transactions, for example. For the data type, to which a simple division is not applicable, other methods can be employed for the calculation. For example, simple division is not applicable to the amount of memory usage in some cases since the amount of memory usage includes a part used regardless of the number of transactions, such as program load and a part used approximately in proportion to the number of transactions. When Ci for all i where 1≦i≦B is calculated, the process proceeds to step S304.
In step S304, a prediction error per transaction Ei corresponding to the i-th data type is calculated for all i where 1≦i≦B. Specifically, the data values of the data items corresponding to the i-th data type are obtained for each of the job execution prediction data and the job actual data loaded in step S301, and the difference of the two data values are calculated. Based on the difference and the number of transactions, the prediction error per transaction Ei is calculated. Like Ci, Ei can be calculated by division; however, other calculation methods can be also employed. When Ei is calculated for all i where 1≦i≦B, the process proceeds to step S305.
In step S305, it is determined if the batch job characteristics of the current batch job are present in the repository 104. When it is present, the determination is Yes, the process proceeds to step S307. When it is absent, the determination is No, the process proceeds to step S306. The determination is the same as that of step S201 and step S202 of
In step S306, the batch job characteristics data of the current batch job are generated from T, Ci, and Ei and are added to the repository 104. Depending on the batch job characteristics' data type, the values of T, Ci, and Ei are used as the data values of the batch job characteristics without any processing or may be used after some processing.
In step S307, the batch job characteristics of the current batch job are updated based on T, Ci and Ei. For example, in the embodiment, which records the mean value in the past as the batch job characteristics, the batch job characteristics are updated to the weighted mean values of the currently recorded data values of the batch job characteristics and any value of T, Ci, or Ei corresponding to the data type of each data value. The weight used for the weighted mean values can be determined, for example, according to the total number of transactions in the past recorded as the data of the batch job characteristics and the number of transactions in execution of the current batch job. In another embodiment, also, the values of T, Ci, and Ei at the latest execution itself may be recorded as the batch job characteristics. In further embodiment, the values of T, Ci, and Ei in the previous n-times of executions (n is a predetermined constant) immediately before the current batch job are recorded as the batch job characteristics, and the mean values of the n-times data may be recorded in addition to the values above. All embodiments shares a point that the update based on T, Ci and Ei is performed in step S307.
After the end of step S306 or step S307, the update process of the batch job characteristics ends.
According to the present invention, since the batch job characteristics are recorded and updated automatically as described above, correct acquisition of the batch job characteristics, which was difficult by the conventional systems, is facilitated. Since the batch job characteristics are updated for every batch job execution, even if the batch job characteristics change due to the change in operation of the batch job, the batch job characteristics can be automatically updated in accordance with the change.
In the following description, for purpose of simplicity, an example of a process performed in the execution server 103-a (1≦a≦N) at time t is explained.
In step S401, the server information extraction subsystem 112 of the execution server 103-a requests the performance information collection subsystem 111 of the execution server 103-a to extract the load information of the execution server 103-a. Afterwards, the process proceeds to step S402. The step S401 corresponds to (13) of
In step S402, the performance information collection subsystem 111 extracts the current load information of the execution server 103-a and returns the result to the server information extraction subsystem 112. The information extracted at this point is a data value corresponding to each data type of the server load information of
In step S403, the server load information in the repository 104 is updated based on the data that the server information extraction subsystem 112 received in step S402. In the case of one-day period as in the example of
After the process of step S403, the process for updating the server load information ends.
Note that the block to be updated in step S403 varies depending on the time period to accumulate the server load information as in the explanation of
In the embodiment other than the above, the server load information is recorded once in the repository 104 as the operation data (
Each of the receiving server 102 and the execution servers 103-1, 103-2, . . . , 103-N constituting the batch job system 101 according to the present invention are realized as a common information processor (computer) as shown in
The information processor of
The receiving server 102 and each of the execution servers 103-1, 103-2, . . . , 103-N can communicate each other via the respective communication interface 203 and a network 209. For example, step S106 and step S109 etc. of
For the storage device 204, various storage devices such as a hard disk and a magnetic disk can be used.
The repository 104 may be provided in the storage device 204 in any of the servers of the receiving server 102 or the execution server group 103. In such a case, the server, where the repository 104 is provided, performs the reference/update of the data in the repository 104 through the processes shown in
The program according to the present invention etc. is stored in the storage device 204 or ROM 201. The program is executed by CPU 200, resulting in the batch job distribution of the present invention being executed. During the execution of the program, data is read from the storage device in which the repository 104 is provided as needed. The data is stored in a register in CPU 200 or RAM 202 and is used for the process in CPU 200. The data in the repository 104 is updated accordingly.
The program according to the present invention may be provided from a program provider 208 via the network 209 and the communication interface 203. It may be stored in the storage device 204, for example, and may be executed by CPU 200. Alternatively, the program according to the present invention may be stored in a distributed commercial portable storage medium 210 and the portable storage medium 210 may be set to the driving device 206. The stored program may be loaded to RAM 202, for example, and can be executed by CPU 200. Various storage mediums such as CD-ROM, a flexible disk, an optical disk, a magnetic optical disk, and DVD may be used as the portable storage medium 210.
Number | Date | Country | Kind |
---|---|---|---|
2006-070814 | Mar 2006 | JP | national |