Resources in a large computing environment are often managed by a scheduling system. Such resources may be clusters of computers or processors, or may include other resources. Large computing tasks may be allocated across blocks of resources by a scheduling mechanism. In many cases, large computing resources may be in great demand, so efficient scheduling may better utilize such resources.
A scheduler for computing resources may periodically analyze running jobs to determine if additional resources may be allocated to the job to help the job finish quicker and may also check if a minimum amount of resources is available to start a waiting job. A job may consist of many tasks that may be defined with parallel or serial relationships between the tasks. At various points during execution, the resource allocation of active jobs may be adjusted to add or remove resources in response to a priority system. A job may be started with a minimum amount of resources and the resources may be increased and decreased over the life of the job.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In the drawings,
Shared computing resources may be allocated to various jobs using a scheduling system. The scheduling system may include a queue for new jobs, where a priority system may determine which job will be started next. An analyzer may periodically evaluate executing jobs to identify resources that are underutilized and may determine to start a new job or allocate additional resources to existing jobs.
Each job may be defined as a series of tasks. Some tasks may be linked in a serial or parallel fashion and each task may use a range of resources. At some points during execution, multiple tasks may be executed in parallel while at other points, a task may wait for other tasks to complete before execution. During a period where parallel tasks may be performed, a job may be completed faster up by applying additional resources, and during other periods, the same resources may be allocated to other jobs.
In some cases, new jobs may be started by determining the minimum amount of resources that may be used to start a job. When those resources become available, the new job may be started. As other resources become free, they may be allocated to the new job so that the new job may be quickly completed.
During the periodic analysis of resource allocation, an algorithm may be used to allocate resources among executing jobs and to decide if a new job is to be started. Different embodiments may use different algorithms. For example, one embodiment may favor completing existing jobs as soon as possible while another embodiment may favor starting new jobs as soon as possible. In some cases, individual priorities between jobs and resources may be considered in selecting a course of action.
Specific embodiments of the subject matter are used to illustrate specific inventive aspects. The embodiments are by way of example only, and are susceptible to various modifications and alternative forms. The appended claims are intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the claims.
Throughout this specification, like reference numbers signify the same elements throughout the description of the figures.
When elements are referred to as being “connected” or “coupled,” the elements can be directly connected or coupled together or one or more intervening elements may also be present. In contrast, when elements are referred to as being “directly connected” or “directly coupled,” there are no intervening elements present.
The subject matter may be embodied as devices, systems, methods, and/or computer program products. Accordingly, some or all of the subject matter may be embodied in hardware and/or in software (including firmware, resident software, micro-code, state machines, gate arrays, etc.) Furthermore, the subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by an instruction execution system. Note that the computer-usable or computer-readable medium could be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, of otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
When the subject matter is embodied in the general context of computer-executable instructions, the embodiment may comprise program modules, executed by one or more systems, computers, or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Many different types of computer resources may be managed. For example, individual processors in a cluster of computers or on a multi-processor computer may be assigned individual tasks that make up a computing job. In other cases, memory resources may be allocated, both random access memory and data storage memory. Network resources may be allocated as well as software licenses, computing resources, and other resources for which there may be contention. Embodiment 100 may be used to manage and allocate any computing resource that may be shared across multiple jobs.
Each job 102 may be composed of multiple tasks. The tasks may be discrete blocks of executable code or functions that may be performed. Each task may have separately defined resources that may be used by the task. For example, in a typical case of a job that is performed on a cluster of processors, an individual task may be performed on a single processor or may use a single software license. In some embodiments, each task may use multiple resources. For example, a task may be defined that uses four processors or defined to run with a minimum of two processors and a maximum of eight processors.
Because jobs 102 may be defined with multiple tasks, a job may use different levels of resources during the course of execution. For example, a job may have several tasks that may be performed in parallel. The job may be performed by assigning each task to a separate processor and have the job finish quickly. The job may also be performed by executing each task in succession on a single processor.
Jobs 102 may be placed in a job queue 104 prior to being executed. A prioritizer 106 may evaluate the jobs in the job queue 104 to determine which job may be executed next.
The prioritizer 106 may use different mechanisms and algorithms for determining which job to execute next. In some embodiments, each job may have a general priority, such as low, medium, and high. Some embodiments may use the length of time since the job has been submitted as another factor. An algorithm may be used to calculate a priority for each pending job, sort the queue, and identify the next job to be executed.
The new job 108 selected by the prioritizer 106 may be analyzed by a new job analyzer 110. The new job analyzer 110 may determine the minimum resources that may be used to start the new job 108. When the minimum amount of resources becomes available, the new job 108 may be started.
The scheduler 112 may determine the allocation of resources across the various jobs. In some instances, the scheduler 112 may add resources to an executing job and in other instances the scheduler 112 may remove resources from a job. During execution, a current job analyzer 114 may analyze the current or running jobs 116 to determine if a job is using all of its allocated resources or if additional resources could be allocated to the job.
In many cases, a job may have multiple tasks that may be operated in parallel and may use additional resources during the period where multiple tasks are being executed in parallel. In other cases, a job may have tasks that are serial or sequentially dependent such that one task is performed after another task has completed. In jobs with many tasks with parallel and sequential dependencies, a job may use different amounts of resources during execution. During a period of massive parallel execution of tasks, the maximum amount of resources may be allocated to the job. Once the period has passed and the job enters a period where many tasks are sequentially dependent, the job may have more resources allocated than the job can use.
During execution of a job, the current job analyzer 114 may determine a maximum amount of resources that may be allocated for a job. The maximum amount of resources may be used to determine how many resources may be applied to the job to finish the job as quickly as possible, for example. The current job analyzer 114 may determine a minimum amount of resources that may be allocated for the same job. The minimum amount of resources may be used by the scheduler 112 to remove a resource so that a higher priority job may use the resource, for example. In this or other cases, the minimum amount of resources may be determined so that a job may be executed without causing a deadlock due to insufficient resources.
The resource manager 118 may monitor and analyze the resources 120 to determine the current status of the various resources 120. In many cases, the allocated resources may include processors for cluster computing applications. In other cases, the allocated resources may include individual computers, various types of memory devices, network connections and bandwidth, various input or output devices, and software licenses or other computing resources. In this specification, any reference to cluster computing and allocating processor or computer resources is by way of example and not limitation.
The scheduler 112 may have many different types of algorithms and may use various factors in determining when to start a new job and how to allocate or apportion resources across various jobs. Each embodiment may use a different logic and/or formula for allocating resources. In some embodiments, a logic may be defined that emphasizes using as much of the resources as possible to complete a job that is executing as soon as possible. Other logic may take into consideration the priority of an executing job and allocate resources in favor of a higher priority job over a lower priority job. Still other embodiments may be designed to begin jobs as soon as possible by allocating resources to new jobs instead of executing jobs.
In some embodiments, the scheduler 112 may evaluate running jobs, allocate resources amongst the running jobs, and then allocate unused resources to any new jobs. Other embodiments may prioritize running jobs and pending jobs together using various algorithms and weighting schemes.
A graphical representation of a task sequence 202 for a job is shown on the left and a table of resource loading 204 is shown on the right.
The task sequence 202 starts at block 206. A first task 208 may be used to initialize the job. Corresponding with the first task 208, the first row of the table 204 indicates that the resource loading has a maximum of one and a minimum of one. The maximum and minimum resource loading may be the amount of resources that may be allocated to the job for that period of time. Since the task 208 is the one task that is operational, one resource may be assigned.
For the purposes of this simplified example, each task may be defined to use a single resource, which may be any type of computing resource. In other examples, each task may have use multiple resources spanning different categories of resources. For example, a single task may use between two and sixteen processors, network access, and a software license while another task may be use single processor, a specific output device, and no software license. In the example of
Tasks 210, 212, and 214 may be performed in parallel after task 208 is completed. While tasks 210, 212, and 214 are being performed, the resource loading has a maximum of three and a minimum of one resource. When the resource loading is one, each of the tasks 210, 212, and 214 may be performed in sequence using the single resource. If more resources are allocated to the job, two or more tasks may be performed simultaneously. For example, if two resources are allocated to the job 202, task 210 and 212 may be performed simultaneously and if three resources were allocated, tasks 210, 212, and 214 may be performed in parallel.
A job scheduler may allocate resources to various jobs based on priority. For example, if the job 202 had a higher priority than other jobs being executed, additional resources may be assigned up to the maximum resource loading so that the job 202 may be completed sooner. Conversely, if the job 202 had a lower priority than other jobs, either executing jobs or pending jobs, resources may be assigned to the other jobs but the minimum resource loading may be preserved for job 202 so that job 202 may continue to be executed.
Tasks 216, 218, and 220 are dependent on task 210 and are parallel tasks. Similarly, tasks 222 and 224 are parallel tasks and are dependent on task 214. During this period, the resource loading may be a maximum of five and a minimum of one.
Task 226 is dependent on tasks 216, 218, 220, 222, 224, and 212. The dependency of task 226 on the other tasks may be defined such that task 226 may not be started until all of the other tasks have been completed. During the period of task 226, the maximum resource loading and the minimum resource loading may be one. While the various tasks on which task 226 depends are completing, one or more tasks may complete before the remaining tasks. If, for example, five resources are allocated to the job and two of the tasks complete before the remaining tasks, three resources may be actively used by the remaining tasks but two resources may be excess and may not be used by the job 202. Such excess resources may be recovered by a job scheduler and assigned to other jobs once the tasks using the resources are complete.
Task 228 may be dependent on task 226 and may have a maximum and minimum resource loading of one. After task 228 is complete, the job 202 may end in block 230.
A scheduler may allocate resources to a job using various algorithms or logic. For example, at the beginning of the job 202, a scheduler may assign a maximum of five resources to the job 202 so that the job 202 may complete quickly, even though the first few tasks may complete without using the maximum resources. In other embodiments, a scheduler may assign one resource initially, and ramp up to five resources during the execution of tasks 216 through 224.
A scheduler may remove resources that are allocated to a job and allocate those resources to another job. For example, if five resources have been assigned to the job 202 and tasks 216 and 218 have been completed and tasks 220, 222, and 224 are in process, the job 202 may make use of three resources but not all five.
In many cases, a scheduler may evaluate the future resource loading that may be assigned to a job to determine the allocation of resources between jobs. For example, a scheduler may evaluate the job 202 and determine that the maximum resource loading is five and allocate five resources to the job 202, even though five resources may be not be fully utilized until tasks 216 through 224 are being performed.
In some embodiments, a scheduler may allocate resources at various stages of a job execution. For example, a scheduler may allocate a single resource to job 202, which may continue using a single resource until task 214 is completed. At that point, more resources may become available and the scheduler may allocate additional resources up to the maximum of five resources.
In some cases, a resource may become available and a scheduler may allocate the resource to a job that is operating below its maximum. In making such allocations, various factors may be evaluated, including the priority of the job, the difference between the current resource allocation and a maximum allocation for a job, the difference between the largest and smallest maximum resource loading for a job, or any other factor.
When resources are available, a scheduler may evaluate if a new job may be started. Each scheduler may have different logic or algorithms for determining when to start jobs and how to allocate resources. In some embodiments, a scheduler may attempt to allocate resources amongst executing jobs before attempting to allocate resources to a new job. In other embodiments, a scheduler may use an algorithm or logic that evaluates the priority of existing jobs and pending jobs and may allocate resources to a new job even though existing jobs may be capable of using those resources.
The example of
While other embodiments may use different logic or methods for allocating resources, embodiment 300 is designed to allocate available resources to executing jobs based on the priority of the executing jobs. Once all resources are allocated to executing jobs, the highest priority waiting job may be started when the minimum amount of resources are available for the waiting job.
Embodiment 300 is an example of one algorithm. Those skilled in the art may appreciate that changes may be made to the embodiment 300 to yield different results, based on a specific implementation. Further, embodiment 300 may be adapted for the resource allocation of different types of resources that may be used by various tasks. In some embodiments, different types of resources may be managed by a scheduler and each job may use different amounts of each type of resource. Such embodiments may use similar or different logic or algorithms to allocate the various types of resources amongst jobs.
The current resources are analyzed in block 302. In many embodiments, a resource analyzer may determine how many resources are present, which resources are allocated, which resources are being utilized, or other information about the current resources. In some situations, the resources may occasionally come on and off line and may not be permanently available.
For each executing job in block 304, a maximum resource loading is determined in block 306, a minimum resource loading in block 308, and a priority for the job in block 310. The maximum and minimum resource loading may be determined for the entire length of a job or for a short section of the job, such as the next several tasks.
In some cases, embodiment 300 or a similar algorithm may be performed many times during the execution of a job, so that resources may be allocated and removed from a job several times over the course of execution. In such cases, the maximum and minimum resource loading for a job may be evaluated for a shorter period of time other than the length of the job execution. For example when resources may be allocated on a periodic basis such as every ten minutes, the maximum and minimum resource loading for a job may be evaluated for the next ten or twenty minutes of execution.
In some cases, the maximum and minimum resource loading may be recalculated when the resource loading changes substantially between jobs. For example, a recalculation may be performed if a task completion or several task completions causes the resource loading to increase or decrease substantially. Such an embodiment may be used to minimize recalculations of resource loading that do not substantially change the existing loading.
If the resources allocated to the job exceed the maximum for the job in block 312, the excess resources may be identified as unused in block 314. The excess resources may be allocated to other jobs in later steps of the algorithm.
In block 316, the executing jobs may be sorted by priority. Each embodiment may have different mechanisms for determining priority for jobs. In some cases, a user may determine a priority for a job prior to submittal. In other cases, the oldest jobs that are executing may be given higher priority than newer jobs. Still other cases may use other criteria or formulas for determining a priority.
For each job in descending priority in block 318, if the resources allocated are below the maximum in block 320, unused resources may be allocated to the job up to the maximum for the job in block 322. The steps of block 318, 320, and 322 allocate the various unused resources to the executing jobs so that the executing jobs may complete quickly, based on priority.
In block 324, the input queue is analyzed. The input queue may contain jobs that have not yet been executed. For each job in the input queue in block 326, a priority is determined in block 328. The priority for incoming jobs may be determined using any criteria or factor, including the length of time in the input queue, overall priority, or other factors.
The input queue is sorted in block 330 to determine the next job to be started. The minimum resources to start the next job are determined in block 332.
If any unused resources are available in block 334 and enough unused resources are available to start the new job in block 336, the new job is started in block 338. The process waits for a task to finish in block 340 before continuing.
If no resources remain in block 334 or if the remaining resources are not sufficient to start the new job in block 336, the process waits at block 340 until a task has finished.
The embodiment 300 is an algorithm that is designed to allocate available resources to existing jobs before starting a new job. Other embodiments may be designed so that high priority jobs in the job queue may be started even when available resources could be allocated to running or executing jobs.
Embodiment 300 is designed to be executed each time a task is completed. Such an embodiment may be useful in a situation where a task takes a relatively long time to complete. In embodiments where the tasks are short, the analysis of embodiment 300 may consume a large amount of overhead and may become unwieldy. In such cases, an embodiment may have an algorithm that is run when a job completes execution, when a certain number of tasks have been completed, or on a periodic time basis.
The foregoing description of the subject matter has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the subject matter to the precise form disclosed, and other modifications and variations may be possible in light of the above teachings. The embodiment was chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and various modifications as are suited to the particular use contemplated. It is intended that the appended claims be construed to include other alternative embodiments except insofar as limited by the prior art.