This patent application claims priority to International Patent Application No. PCT/EP2008/060297, which was filed under the Patent Cooperation Treaty (PCT) on Aug. 5, 2008, and claims priority to European Patent Application No. 07119759.4 filed with the European Patent Office on Oct. 31, 2007, said applications expressly incorporated herein by reference in their entireties.
The present invention relates to a method, system and computer-usable medium for distributing a plurality of jobs to a plurality of computers.
Workload scheduling is an increasingly important component of IT (Information Technology) environments. Indeed, many grid computing environments are driven by the scheduling of work across a distributed set of resources (e.g. computation, storage, communication capacity, software licenses, special equipment etc.). Most grid systems include a job scheduler, to locate a machine on which to run a job submitted by a user. Simple job schedulers assign jobs to the next available machine whose resources match the needs of the job. However, more advanced schedulers implement a job priority system, wherein jobs with higher priorities are preferentially allocated to grid machines. Schedulers may also implement policies, providing various constraints on jobs, users, and resources (e.g. restricting grid jobs from executing at certain times of the day).
In essence, scheduling is an optimization problem, which is fairly straightforward when only one resource type (e.g. CPU) is involved. Further performance improvements may be achieved by including more resource variables in the scheduling process, the resulting multivariate optimization becomes a difficult mathematics problem. In attempting to simplify the problem, prior art workload distribution algorithms typically assume that the distribution problem is one of deploying substantially homogeneous jobs in a substantially homogeneous environment. Thus, these prior art algorithms fail to recognize that different jobs often have different resource requirements. Similarly, prior art algorithms typically fail to recognize the extent to which a given job deployment may influence subsequent jobs, thereby affecting the overall job throughput of a system.
The following summary is provided to facilitate an understanding of some of the innovative features unique to the present invention and is not intended to be a full description. A full appreciation of the various aspects of the embodiments disclosed herein can be gained by taking the entire specification, claims, drawings, and abstract as a whole.
A method, system and/or computer-usable medium are disclosed for distributing a plurality of jobs to a plurality of computers. Such a method and system includes determining every possible pairing of the jobs to the computers, to produce a plurality of provisional distributions; ranking the provisional distributions according to the extent to which they satisfy the individual requirements of each job, ranking the provisional distributions according to the extent to which they match a predefined distribution; ranking the provisional distributions according to the extent to which they maximize a throughput of the computers; determining an optimal distribution from the rankings; and deploying the or each of the jobs to the or each of the computers in accordance with the optimal distribution.
In contrast with prior art workload-scheduling systems, the preferred embodiment considers the different aspects of both the jobs to be distributed and the resources to which the jobs might be distributed. To this end, one of the underlying principles of the preferred embodiment is that each job has different characteristics and should be individually optimized according to its own requirements
Furthermore, the preferred embodiment differs from prior art workload-scheduling systems insofar as it recognizes the influence of a given job distribution on the operation of other subsequent jobs. In particular, the preferred embodiment recognizes that the deployment of a job to a given resource may cause that resource to be effectively reserved by the job, thereby preventing other subsequent jobs from being deployed to the resource in question.
The accompanying figures, in which like reference numerals refer to identical or functionally-similar elements throughout the separate views and which are incorporated in and form a part of the specification, further illustrate the present invention and, together with the detailed description of the invention, serve to explain the principles of the present invention.
The invention itself, as well as further features and the advantages thereof, will be best understood with reference to the following detailed description, given purely by way of a non-restrictive indication, to be read in conjunction with the accompanying drawings, in which:
The particular values and configurations discussed in these non-limiting examples can be varied and are cited merely to illustrate at least one embodiment and are not intended to limit the scope of such embodiments.
Figure A. Overview
The preferred embodiment provides a mechanism for determining an optimal workload distribution, from a plurality of candidate workload distributions, each of which has been determined to optimize a particular aspect of a workload-scheduling problem. More particularly, referring to
From either or both of the above parameters, the preferred embodiment determines (14) a workload distribution based on a total prioritized weight parameter. The preferred embodiment also determines (16) a workload distribution which attempts to match the previously determined candidate workload distributions to a goal distribution. Similarly, the preferred embodiment calculates (18) a further workload distribution which attempts to maximize job throughput. From all these workload distributions, the preferred embodiment calculates (20) an overall optimal workload distribution, which effectively considers the following configurable factors: resource selection policies of each job; job priority (optional); work balancing; and job throughput.
In determining the overall optimal workload distribution, the preferred embodiment assesses and compares the merits of candidate job distributions on the basis of the above variables, by providing homogeneous values thereto (in the interval [0-100]) and developing an objective function which includes all of the variables.
For the sake of example, the preferred embodiment is described within the context of an IBM Tivoli Dynamic Workload Broker (ITDWB). However, it will be appreciated that the preferred embodiment is not limited to this implementation; and could instead be implemented in a variety of different software platforms.
B. More Detailed Description
Step 1 Determining a normalized independent workload distribution based on resource selection policy.
A job to be executed typically has a set of resource constraints (e.g. available disk space, memory etc.) which determine the targets (i.e. “capable” resources) to which the jobs can be submitted for execution. Each job also has a resource selection policy, which ranks the capable resources that satisfy the requirements of the job. These policies essentially comprise the maximization or minimization of a particular attribute of a capable resource. For example, a job that requires a lot of CPU processing, may have “MINIMIZE CPU UTILIZATION” as its resource selection policy. In this case, capable resources that have less CPU utilization are preferred. Similarly, a job that requires a lot of free memory, may have “MAXIMIZE FREE MEMORY” as a resource selection policy. In this case, capable resources that have more free memory are preferred.
Thus, for simplicity, let there be N jobs JεRN to be executed, and let there be m(i) capable resources RiεRm(i) for each job jiεJ. Each job ji has a resource selection policy in the form MIN|MAX RAttr(rip, ji) (i=1 to N; and p=1 to m(i)), where RAttr is the attribute to be minimized or maximized. Referring to
The preferred embodiment orders the capable resources of each job ji based on the job's resource selection policy and the values of the relevant attributes of the capable resources. For example, if RAttr(r11, j1)>RAttr(r12, j1) and the resource selection policy requires the minimization of the relevant attribute, the ordered resource list for job j1 will become r2, r1. More generally and referring to
Whilst this ordering process provides an indication of the best capable resource for a given job, it does not provide any information regarding the extent to which the best capable resource excels over the other capable resources. For example, the preferred embodiment may have determined that a capable resource with 10% of CPU utilization is better than a first other capable resource with 11% of CPU utilization, or a second other capable resource with 90% of CPU utilization. However, this does not indicate that the best capable resource is marginally better than the first other capable resource and much better than the second other capable resource. Similar considerations apply to resource attributes with absolute values, such as free memory. To overcome this problem, the preferred embodiment normalizes attributes with weights in the range 0-100. For example, if the resource selection policy for a job jk, comprises a maximization operation, the weighting (W(rk, jk)) on the capable resources rk for the job jk is given by equation (1) below:
Similarly, if the resource selection policy for a job jk, comprises a minimization operation, the weighting (W(rk, jk)) on the capable resources rk for the job jk is given by equation (2) below:
For example, if there are three capable resources (with 10%, 11% and 90% of CPU utilization) for job j1; and the job's resource selection policy comprises a minimization operation, the normalizing weights for the resources are:
Similarly, if, for example, there are three resources with 1000, 900 and 100 Mb of free memory for job j1; and the job's resource selection policy comprises a maximization operation, the normalizing weights for the resources are:
Referring to
Step 2: Determining an Independent Workload Distribution Based on Job Priority.
The above mechanism for determining a workload distribution is essentially focuses on assessing different attributes of the available resources. However, jobs themselves have different attributes, reflected in their priorities. The preferred embodiment (optionally) considers the role of job priorities on workload distributions by calculating a prioritized weight (PW) for each capable resource of each job as follows, according to equation (3) below:
PW(rk,jk)=pi*W(rk,jk) (3)
The resulting prioritized weights for the job are shown in
Step 3: Calculating a Total Prioritized Weighting Measure
The previous steps produce a particular workload distribution that may be adopted at any given time. However, these steps do not consider the temporal aspects of resource constraints and reservations. Referring to
For simplicity, and to avoid confusion with the workload distributions determined in the previous steps, the workload distributions produced from the consideration of the temporal aspects of resource constraints and reservations, will henceforth be known as temporal workload distributions. Thus, at any given moment, let there be n possible temporal workload distributions S, wherein each Sk (k=1 to n) comprises a set {jirlc[1dm(i)]} (i=1 to N) of job, resource pairs. In the present step, the preferred embodiment assesses the merits of each possible temporal workload distribution Sk based on its prioritized weights. To do this, the preferred embodiment calculates a total prioritized weighting value TW(Sk) of all prioritized weights PW(irlc[1dm(i)], ji) of each temporal workload distribution Sk using the following expression shown in equation (4) below:
More particularly, the total prioritized weighting TW(Sk) values are normalized to the range 0-100, using the following expression indicated by equation (5) below:
Examples of the total prioritized weighting values and normalized prioritized weighting values of the temporal workload distributions in
Step 4: Calculating a Distribution Solution Based on a Goal Distribution
In many cases, an administrator may have already specified a desired workload distribution for a system, wherein the desired workload distribution is known as a goal distribution (GD). For example, the administrator may have specified that a workload be equally distributed across all the resources. Thus, given five resources, the goal is to distribute 20% of the jobs to every resource. However, the previous steps of the preferred embodiment may have already resulted in a workload distribution that differs from the goal distribution (GD). To overcome this problem, the preferred embodiment calculates a sum delta square distribution (ΔDk), that when added to a pre-existing distribution (PD) and candidate distribution Sk, brings it closer to the goal distribution (GD). Accordingly, the preferred embodiment, in effect, calculates a measure of the extent to which the candidate distribution achieves the goal distribution.
Let the pre-existing distribution (PD) be defined as the set {PNJ(R1), PNJ(R2), . . . , PNJ(Rm)} of Pre-existing Number of Jobs (PNJ) for each resource, wherein the pre-existing number of jobs PNJ represents the number of jobs running when the algorithm starts calculating the new distribution based on incoming jobs. Furthermore, let the goal distribution (GD) be defined as the set {GD(R1), GD(R2), . . . , GD(Rm)} of Goal Distributions (GD) for each resource. A distribution solution Sk comprises an indication of how a plurality of incoming jobs (received by the system in a given time interval) is to be distributed amongst a finite number (m) of resources. More particularly, a distribution solution Sk may be generically described as a set of current number of jobs CNJk distributed to each of the available resources (i.e. Sk={CNJk(R1), CNJk(R2), . . . . CNJk(Rm)}. For example, referring to
The preferred embodiment calculates a projected distribution (PD) from the combination of the current numbers of jobs (CNJk(Ri i=1 to m)), of a candidate distribution (Sk) and a pre-existing distribution (PD), in accordance with equation (6) below.
where m=the number of resources.
From this information, the preferred embodiment then calculates a sum delta square distribution ΔDk (for each resource) in accordance with equation 7 below.
wherein ΔDk is normalized to 0-100 as follows in equation (8):
The sum delta square distribution ΔDk indicates the distance between the projected distribution and the goal distribution. In particular, a higher value of the normalized delta variable indicates a better candidate distribution solution. Taking for example, the weight distribution shown in
Step 5: Calculating the Best Distribution Solution Based on the Goal to Maximize the Throughput
Depending on resource constraints and reservations, the deployment of a job to a given resource (in a particular workload distribution) may prevent the deployment of subsequent jobs to that resource (until the first job is finished). As a result, there may in fact, be different numbers of jobs active at any given time during the implementation of a given workload distribution. This leads to problems when attempting to maintain a high throughput. Thus, the preferred embodiment calculates a measure of the extent to which a workload distribution maximizes the throughput of a system. More particularly, given a workload distribution Sk comprising {CNJk (R1), CNJk (R2), . . . , CNJk (Rm)} of Current Number of Jobs (CNJ) for each resource, the total number of jobs TNJk deployed in a given workload distribution is calculated using equation 9.
The total number of jobs TNJk for each solution is normalized to a value in the range 0-100, to get a Total Job Distribution TJD for each workload distribution, as given by equation (10) below:
The Total Job Distribution TJD value provides a measure of how much a workload distribution maximizes job throughput. Given the distribution shown in
Step 6: Combining all Criteria to Get the Optimal Solution
In the previous steps, the preferred embodiment calculated normalized measures (0-100) that describe the benefits of a given workload distribution from different perspectives. In particular, the preferred embodiment has calculated a normalized total weight (TNWk), which provides a measure of the merits of a solution compared with individual job resource distributions; a normalized delta from a goal distribution (ΔNDk), which provides a measure of the merits of a solution compared with a goal distribution; and a total job distribution (TJDk) value which provides a measure of the extent to which the distribution for each solution maximizes job throughput.
Using these metrics, an optimal distribution OptD solution can be defined as the distribution that maximizes the following expression
OptD=Max(αTNWk+μΔNDk+λTJDk) (11)
where α, μ, λ c [0, 1] may be used to have different contribution or even to exclude any criteria. By default α, μ, λ can be set to 1, so that the optimal solution is the one that on the average is the best for all of the previously identified aspects.
Based on the examples above, the resulting OptD values are shown in
A computer on which the preferred embodiment operates has a generic structure shown in
Alterations and modifications may be made to the above without departing from the scope of the invention. It will be appreciated that variations of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
07119759 | Oct 2007 | EP | regional |
PCT/EP2008/060297 | Aug 2008 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
5440675 | Matsunaga et al. | Aug 1995 | A |
5442730 | Bigus | Aug 1995 | A |
5826236 | Narimatsu et al. | Oct 1998 | A |
6047220 | Eryurek | Apr 2000 | A |
6256651 | Tuli | Jul 2001 | B1 |
6647377 | Jannarone | Nov 2003 | B2 |
6675189 | Rehg et al. | Jan 2004 | B2 |
6961720 | Nelken | Nov 2005 | B1 |
7003475 | Friedland et al. | Feb 2006 | B1 |
7093004 | Bernardin et al. | Aug 2006 | B2 |
20020016785 | Sirgany | Feb 2002 | A1 |
20030084157 | Graupner et al. | May 2003 | A1 |
20040093351 | Lee et al. | May 2004 | A1 |
20040098718 | Yoshii et al. | May 2004 | A1 |
20040205108 | Tanaka | Oct 2004 | A1 |
20050076043 | Benedetti et al. | Apr 2005 | A1 |
20050086098 | Fulton et al. | Apr 2005 | A1 |
20050108074 | Bloechl et al. | May 2005 | A1 |
20050240934 | Patterson et al. | Oct 2005 | A1 |
20050267770 | Banavar et al. | Dec 2005 | A1 |
20060136922 | Zimberg et al. | Jun 2006 | A1 |
20060195847 | Amano et al. | Aug 2006 | A1 |
20060218551 | Berstis et al. | Sep 2006 | A1 |
20060259621 | Ranganathan et al. | Nov 2006 | A1 |
20070039004 | Guralnik et al. | Feb 2007 | A1 |
20070245300 | Chan et al. | Oct 2007 | A1 |
Number | Date | Country |
---|---|---|
WO2005006214 | Jan 2005 | WO |
WO 2008040563 | Apr 2008 | WO |
Number | Date | Country | |
---|---|---|---|
20090113442 A1 | Apr 2009 | US |