A cloud infrastructure can include various resources, including computing resources, storage resources, and/or communication resources, that can be rented by customers (also referred to as tenants) of the provider of the cloud infrastructure. By using the resources of the cloud infrastructure, a tenant does not have to deploy the tenant's own resources for implementing a particular platform for performing target operations. Instead, the tenant can pay the provider of the cloud infrastructure for resources that are used by the tenant. The “pay-as-you-go” arrangement of using resources of the cloud infrastructure provides an attractive and cost-efficient option for tenants that do not desire to make substantial up-front investments in infrastructure.
The following description illustrates various examples with reference to the following figures:
A cloud infrastructure can include various different types of computing resources that can be utilized by or otherwise provisioned to a tenant for deploying a computing platform for processing a workload of a tenant. A tenant can refer to an individual or an enterprise (e.g., a business concern, an educational organization, or a government agency). The computing platform (e.g., the computing resources) of the cloud infrastructure are available and accessible by the tenant over a network, such as the Internet, a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), and so forth.
Computing resources can include computing nodes, where a “computing node” can refer to a computer, a collection of computers, a processor, or a collection of processors. In some cases, computing resources can be provisioned to a tenant according to determinable units offered by the cloud infrastructure system. For example, in some implementations, computing resources can be categorized into computing resources according to processing capacity of different sizes. As an example, computing resources can be provisioned as virtual machines (formed of machine-readable instructions) that emulate a physical machine. A virtual machine can execute an operating system and applications like a physical machine. Multiple virtual machines can be hosted by a physical machine, and these multiple virtual machines can share the physical resources of the physical machine. Virtual machines can be offered according to different sizes, such as small, medium, and large. A small virtual machine has a processing capacity that is less than the processing capacity of a medium virtual machine, which in turn has less processing capacity than a large virtual machine. As examples, a large virtual machine can have twice the processing capacity of a medium virtual machine, and a medium virtual machine can have twice the processing capacity of a small virtual machine. A processing capacity of a virtual machine can refer to a central processing unit (CPU) and memory capacity, for example.
A provider of a cloud infrastructure can charge different prices for use of different resources. For example, the provider can charge a higher price for a large virtual machine, a medium price for a medium virtual machine, and a lower price for a small virtual machine. In a more specific example, the provider can charge a price for the large virtual machine that is twice the price of the medium virtual machine. Similarly, the price of the medium virtual machine can be twice the price of a small virtual machine. Note also that the price charged for a platform configuration can also depend on the amount of time that resources of the platform configuration are used by a tenant.
Also, the price charged by a provider to a tenant can vary based on a duster size by the tenant. If the tenant selects a larger number of virtual machines to include in a cluster, then the cloud infrastructure provider may charge a higher price to the tenant, such as on a per virtual machine basis.
The configuration of computing resources selected by a tenant, such as a processor sizes, virtual machines, computer nodes, network bandwidth, storage capacity, number of virtual machines in a cluster, and the like may be referred to as a platform configuration. The choice of the platform configuration can impact the cost or service level of processing a workload.
A tenant is thus faced with a variety of choices with respect to resources available in the cloud infrastructure, where the different choices are associated with different prices. Intuitively, according to examples discussed above, it may seem that a large virtual machine can execute a workload twice as fast as a medium virtual machine, which in turn can execute a workload twice as fast as a small virtual machine. Similarly, it may seem that a 40-node duster can execute a workload four times as fast as a 10-node cluster.
As an example, the provider of the cloud infrastructure may charge the same price to a tenant for the following two platform configurations: (1) a 40-node cluster that uses 40 small virtual machines; or (2) a 10-node duster using 10 large virtual machines. Although it may seem that either platform configuration (1) or (2) may execute a workload of a tenant with the same performance, in actuality, the performance of the workload may differ on platform configurations (1) and (2). The difference in performance of a workload by the different platform configurations may be due to constraints associated with network bandwidth and persistent storage capacity in each platform configuration. A network bandwidth can refer to the available communication bandwidth for performing communications among computing nodes. A persistent storage capacity can refer to the storage capacity available in a persistent storage subsystem.
Increasing the number of computing nodes and the number of virtual machines may not lead to a corresponding increase in persistent storage capacity and network bandwidth. Accordingly, a workload that involves a larger amount of network communications would have a poorer performance in a platform configuration that distributes the workload across a larger number of computing nodes and virtual machines, for example. Since the price charged to a tenant may depend in part on an amount of time the resources of cloud infrastructure are reserved for use by the tenant, it may be beneficial to select a platform configuration that reduces the amount of time that resources of the cloud infrastructure are reserved for use by the tenant.
Selecting a platform configuration in a cloud infrastructure can become even more challenging when a performance objective is to be achieved. For example, one performance objective may be to reduce (or minimize) the overall completion time (referred to as a “makespan”) of the workload. A makespan may be measured from the time a workload begins to when the workload is completed.
Another challenge for selecting a platform configuration for a duster is that different types of jobs may perform differently depending on platform configuration. That is, some types of workloads may perform better under one platform configuration and other types of workloads may perform better under other platform configurations. Accordingly, for workloads that include jobs of differing types, a single homogeneous set-up (e.g., a cluster where each virtual machine is configured with the same platform configuration) might not always perform better when compared to a heterogeneous set-up (e.g., a set-up that includes multiple dusters, where the virtual machines across different dusters may be configured with different platform configurations). In turn, cost savings may vary across different platform configurations. For example, the execution of one application (e.g., Kmeans) on large VM instances can lead to higher cost savings than the execution of another application (e.g., TeraSort) on small VM instances.
In one aspect, a platform configuration request may be received by a computing system. A platform configuration request may be a message sent by a tenant system that requests configuration of a cloud infrastructure to execute a workload (a set of jobs). In some cases, the platform configuration request can specify a job list and a quality of service metric from a tenant system. The computing system may then split the job list into a first job sub-list and a second job sub-list. The computing system may generate a heterogeneous simulation result that includes a heterogeneous platform configuration and a cost. The heterogeneous simulation result can be based on a heterogeneous simulation. Further, the heterogeneous simulation can include: a first sub-simulation of the first job sub-list on a first platform configuration, and a second sub-simulation of the second job sub-list on a second platform configuration. The computer system may select the heterogeneous platform configuration of the heterogeneous simulation result over other heterogeneous platform configurations based on a comparison of the cost of the heterogeneous simulation result against costs of other heterogeneous simulation results. The heterogeneous platform configuration may then be communicated to the tenant system.
In another aspect, a system may include a processor and a machine-readable storage device. The machine-readable storage device may comprise instructions that, when executed, cause the processor to receive a job list that specifies jobs of a prospective workload. The processor, executing the instructions, may then split the job list into a first job sub-list and a second job sub-list. A first homogeneous simulation result may be generated based on a simulation of the first job sub-list using a first homogeneous platform configuration. Further, a second homogeneous simulation result can be generated based on a simulation of the second job sub-list using a second homogeneous platform configuration. Then, a heterogeneous simulation result can be generated based on the first homogeneous platform configuration and the second homogeneous platform configuration. A cloud infrastructure can provision computer resources according to the heterogeneous simulation result.
In yet another aspect, a heterogeneous evaluation system may receive a platform configuration request from the tenant system. The platform configuration request can specify a job list and a quality of service metric. The heterogeneous evaluation system may then generate a heterogeneous platform configuration selection based on simulations of sub-lists formed from the job list. The heterogeneous platform configuration selection can specify a first platform configuration for a first cluster and a second platform configuration for a second cluster. The first duster may be assigned to execute one job sub-list and the second duster may be assigned to execute another job sub-list. The heterogeneous evaluation system can communicate the heterogeneous platform configuration selection to the tenant system.
The tenant system 106 is communicatively coupled to the cloud infrastructure system 104. A tenant system can refer to a computer or collection of computers associated with a tenant. Through the tenant system 106, a tenant can submit a platform configuration request to the cloud infrastructure system 104 to rent the resources of the cloud infrastructure system 104 through, for example, virtual machines executing on the computing nodes 102. A platform configuration request for resources of the cloud infrastructure system 104 can be submitted by a tenant system 106 to a heterogeneous evaluation system 108 of the cloud infrastructure system 104. The platform configuration request can identify a workload of jobs (e.g., a job list) to be performed, and can also specify a target makespan, deadline, and/or a cost the tenant is willing to spend on executing a workload.
The heterogeneous evaluation system 108 may be a computer system that interfaces with the tenant system 106 and the cloud infrastructure system 104. The heterogeneous evaluation system 108 may be a computer system that is configured to select a heterogeneous platform configuration from among multiple heterogeneous platform configurations that can be hosted on the cloud infrastructure system 104.
In some cases, a selection of a heterogeneous platform configuration can be presented in a heterogeneous platform configuration selection 116 that the tenant can use to purchase computing resources from the cloud infrastructure system 104. The heterogeneous platform configuration selection 116 may include a user selectable selection of a heterogeneous platform configuration. Example methods and operations for selecting a heterogeneous platform configuration is discussed in greater detail below. Once the heterogeneous platform configuration is selected by the heterogeneous evaluation system 108 (as may be initiated by a tenant through the heterogeneous platform configuration selection 116), the selected resources that are part of the selected heterogeneous platform configuration (including a cluster of computing nodes 102 of a given cluster size, and virtual machines of a given size) are made accessible to the tenant system 106 to perform a workload of the tenant system 106.
Operationally, the heterogeneous evaluation system 108 may receive a platform configuration request from the tenant system 106. The platform configuration request can specify a job list and a quality of service metric. The heterogeneous evaluation system 108 may then generate a heterogeneous platform configuration selection based on simulations of sub-lists formed from the job list. The heterogeneous platform configuration selection can specify a first platform configuration for a first cluster and a second platform configuration for a second duster. The first duster may be assigned to execute one job sub-list and the second duster may be assigned to execute another job sub-list. In
The heterogeneous evaluation system 108 can communicate the heterogeneous platform configuration selection 116 to the tenant system 106.
By way of example and not limitation, the tenant system 106 may rent computing resources from the cloud infrastructure system 104 to host or otherwise execute a workload that includes MapReduce jobs. Before discussing further aspects of examples of the cloud infrastructure service 100, MapReduce is now discussed. MapReduce jobs operate according to a MapReduce framework that provides for parallel processing of large amounts of data in a distributed arrangement of machines, such as virtual machines 120, as one example. In a MapReduce framework, a MapReduce job is divided into multiple map tasks and multiple reduce tasks, which can be executed in parallel by computing nodes. The map tasks operate according to a user-defined map function, while the reduce tasks operate according to a user-defined reduce function. In operation, map tasks are used to process input data and output intermediate results. Reduce tasks take as input partitions of the intermediate results to produce outputs, based on a specified reduce function that defines the processing to be performed by the reduce tasks. More formally, in some examples, the map tasks process input key-value pairs to generate a set of intermediate key-value pairs. The reduce tasks produce an output from the intermediate key-value pairs. For example, the reduce tasks can merge the intermediate values associated with the same intermediate key.
Although the foregoing may reference provisioning resources from the cloud infrastructure system to execute MapReduce jobs, it is noted that techniques or mechanisms according to other implementations contemplated by this disclosure can be applied to select platform configurations for workloads that include other types of jobs.
Although
The method 300 may begin at operation 302 when the heterogeneous platform configuration selector 216 receives a platform configuration request. The platform configuration request may, in some cases, specify a job list and a quality of service metric from a tenant system. The quality of service metric may be an expected makespan, cost, or the like. In some cases, a tenant system may transmit the platform configuration request to the heterogeneous platform configuration selector responsive to a user initiated request, which may occur when the user submits the job list to the heterogeneous evaluation system for pricing.
At operation 304, the heterogeneous platform configuration selector may split the job list into a first job sub-list and a second job sub-list. In some cases, the selection of a split point for the job list may be performed systematically. For example, as described below, the job list may be ranked according a preference function that measures a preference a given job has towards a given platform configuration. If the job list is sorted in this way, the split point may be set to the beginning of the job list. Then, further iterations of the operations of the method 300 may update the split point so that each iteration considers different permutations for the first job sub-list and the second job sub-list.
At operation 306, the heterogeneous platform configuration selector may generate a heterogeneous simulation result based on a heterogeneous simulation. The heterogeneous simulation result may specify a heterogeneous platform configuration and a cost. A heterogeneous simulation may include multiple sub-simulations, such as: (a) a first sub-simulation of the first job sub-list on a first platform configuration, and (b) a second sub-simulation of the second job sub-list on a second platform configuration. For example, the first sub-simulation may be performed by a simulator configured to simulate the first job sub-list on a duster of small VM instances, while the second sub-simulation may be performed by a simulator configured to perform a simulation of the second job sub-list on a duster of large VM instances.
At operation 308, the heterogeneous platform configuration selector may select the heterogeneous platform configuration of the heterogeneous simulation result over other heterogeneous platform configurations based on a comparison of the cost of the heterogeneous simulation result against costs of other heterogeneous simulation results. For example, the heterogeneous platform configuration selector may select a heterogeneous platform configuration from a first heterogeneous simulation result over another heterogeneous platform configuration from a second heterogeneous simulation result if the with the first heterogeneous simulation result has a lower cost.
At operation 310, the heterogeneous platform configuration selector may communicate the heterogeneous platform configuration to the tenant system. The heterogeneous platform configuration selector may communicate the selected heterogeneous platform configuration to the tenant system in a heterogeneous platform configuration selection (e.g., such as the heterogeneous platform configuration selection 116 in
Accordingly, the heterogeneous evaluation system 108 may provide a tenant with a comparatively simple mechanism to select a heterogeneous platform configuration to execute a job list on a cloud infrastructure with multiple dusters.
In some cases, a heterogeneous platform configuration selector may utilize a cost matrix to select a heterogeneous platform configuration. A cost matrix is now discussed in greater detail. In some cases, a cost matrix may include data that characterizes a job's preference for a given platform configuration. This preference may be used to estimate a cost savings splitting the execution of the jobs in a job list over multiple dusters. To estimate a cost savings for different platform configuration choices, assume that a relatively preferred homogeneous platform configuration for processing a given jobs list within deadline D is a duster with KH VM instances of type TH Let the processing makespan on this duster be DH and let PriceH be the price of this type VM instance per time unit. Then the cost of the relatively preferred homogeneous platform configuration may be determined according to:
B=DH·KH·PriceH
Let Pricetype be the price of a type VM instance per time unit. Then the customer can rent Ntype of VMs instances of a given type (where type ∈ {small, medium, large}):
Ntype max=B/(D·Pricetype)
Using the above calculations, a heterogeneous platform configuration selector can compute a cost metric, referred to as Costtype(J), for each VM instance type and each job J in the set W=(J1, J2, . . . Jn). For example, the heterogeneous platform configuration selector can compute a cost value according to:
In the above equation, Costtype i(J) can denote the cost of executing a given job (e.g., J) on a duster with i VMs instances of type type. As the above equation shows, a cost value (e.g., Costtype(J)) may be a value of the summation of Costtype i(J) across clusters of instances of type type but varying over different duster sizes. In this way, a cost value may be used to approximate a cost of executing the job using a given instance type over different duster sizes.
For prioritizing (e.g., ranking or sorting) the job's preference to different VM types, the heterogeneous platform configuration selector can generate or otherwise utilize a cost matrix that predicts, for a job, a cost of executing the job on various platform configurations. For example, in some cases, the cost matrix may include a Costtype(J) for each job in a workload and for each platform configuration available in a cloud infrastructure (e.g., in the case of instance size, small, medium, or large). Additionally or alternatively, the cost matrix may include rank values that approximate a preference a respective job has to a given platform configuration. Rank values may be calculated, in some cases, according to:
Ranks&m(J)=Costsmall(J)−Costmedium(J);
Ranks&l(J)=Costsmall(J)−Costlarge(J); and
Rankm&l(J)=Costmedium(J)−Costlarge(J).
The value of Ranks&l(J) may indicate a possible preference for a platform configuration choice between small VM instances and large VM instances for the given job J, where the degree of negativity of the value may suggest a preference of small VM instances, the degree of positivity of the value may suggest a preference for large VM instances, and values closer to 0 may reflect less sensitivity to a platform choice. The value of Ranks&m(J) may indicate a possible preference for a platform configuration choice between small VM instances and medium VM instances for the given job J, where the degree of negativity of the value may suggest a preference of small VM instances, the degree or positivity of the value may suggest a preference for medium VM instances, and values closer to 0 may reflect less sensitivity to a platform choice. The value of Rankm&l(J) may indicate a possible preference for a platform configuration choice between medium VM instances and large VM instances for the given job J, where the degree of negativity of the value may suggest a preference of medium VM instances, the degree of positivity of the value may suggest a preference for large VM instances, and values closer to 0 may reflect less sensitivity to a platform choice.
To begin, the heterogeneous platform configuration selector may sort a job list 410. For example, the heterogeneous platform configuration selector may sort the jobs j∈(J1, . . . , Jn) in ascending order based on comparing the values of Ranks&l(J) for each of the jobs in the job list 410. As discussed above, the values of Ranks&l(J) may be stored or otherwise derived from the cost matrix. After being sorted, the jobs in the beginning of the sorted job list may have a performance preference for executing on the small VM instances, whereas jobs in the end of the sorted list may have less of a preference (or, in some cases, an aversion) for executing on small VM instances.
Then the heterogeneous platform configuration selector may split the sorted job list into two sub-lists a first sub-list to be executed on a cluster with small instances and the other sub-list to be executed on the duster with large instances. In one example, the heterogeneous platform configuration selector may split the job list at a split point. The split point may be initialized to be before J1, so that that the first job sub-list is empty and the second job sub-list includes J1-Jn. Alternatively, the split point may be initialized to be after J1, so that that the first job sub-list includes J1 and the second job sub-list includes J2-Jn.
The heterogeneous platform configuration selector may then cause a first simulator 420 to simulate an execution of the jobs in the first job sub-list according to a homogeneous cluster configured with small VM instances. The heterogeneous platform configuration selector may also cause a second simulator 422 to simulate an execution of the jobs in the second job sub-list according to another homogeneous duster that is configured with large VM instances. Both the first and the second simulators may receive the user supplied quality of service metric so that the simulators 420, 422 can select a corresponding platform configuration (e.g., a number of instances) which satisfies the quality of service metric and with minimal cost. Simulating according to a homogeneous platform configuration is discussed in greater detail below.
The simulators 420, 422 may generate homogeneous simulation results 430, 432, respectively. Each homogeneous simulation result may include a homogeneous platform configuration (e.g., a number of instances for an instance type) and a cost to execute the corresponding job sub-list with that platform configuration.
From the homogeneous simulation results 430, 432, the heterogeneous platform configuration selector may generate a heterogeneous simulation result 440 by aggregating the homogeneous simulation results 430, 432. Aggregating the homogeneous simulation results may include creating a heterogeneous simulation result with the platform configuration (e.g., K-Small) of the first homogeneous simulation result 430, the platform configuration (e.g., K-Large) of the second homogeneous simulation result 432, a cost that is the sum of the costs specified in the first homogeneous simulation result (e.g., Cost-Small) and the second homogeneous simulation result (e.g., Cost-Large).
The heterogeneous platform configuration selector may then perform other iterations of the method 400, as shown by loop 450. In executing the loop 450, the heterogeneous platform configuration selector may select another split point and creating new job sub-lists. This can be done by incrementing the split point, which the first job sub-list being the jobs on one side or the split point and the second job sub-list being the jobs on the other side of the split point. An iteration may also include executing the simulators 420, 422 with the new job sub-lists created from the new split point. An iteration can further include creating another heterogeneous simulation result from new homogeneous simulation results. In some cases, the heterogeneous simulation result for each iteration are stored by the heterogeneous platform configuration selector. In other cases, the heterogeneous platform configuration selector may track a preferred heterogeneous platform configuration based on a comparison between a variable that stores a preferred heterogeneous simulation result and the heterogeneous simulation result of the current iteration. If the heterogeneous simulation result of the current iteration is better than the variable that stores the preferred heterogeneous simulation result, the heterogeneous platform configuration selector may then update the variable for the preferred heterogeneous simulation result with the heterogeneous simulation result of the current iteration. If not, the heterogeneous platform configuration selector may then ignore the heterogeneous simulation result of the current iteration.
The heterogeneous platform configuration selector can continue by executing additional iterations until the split point iterates over each job in the job list 410. After the heterogeneous platform configuration selector finishes iterating over the job list, the heterogeneous platform configuration selector may select the homogeneous simulation result that reduces the cost of executing the workload. For example, the heterogeneous platform configuration selector may iterate over the heterogeneous simulation results and select the heterogeneous simulation result with the lowest cost. Alternatively, selecting the homogeneous simulation result can be obtained by accessing the variable storing the preferred homogeneous simulation result.
In such a manner (e.g., iterating over the sorted job list), the heterogeneous platform configuration selector can find the heterogeneous simulation result and the job list split which leads to a solution with a minimal cost. Once the heterogeneous simulation result is selected by the heterogeneous platform configuration selector, the heterogeneous platform configuration selector can place data across the clusters of the homogeneous platform. For example, for the selected heterogeneous simulation result, the heterogeneous platform configuration selector may cause the jobs J1, . . . Ji to be executed on a sub-cluster composed on small VM instances and the jobs Ji+1, . . . Jn is to be executed on a sub-duster of large VM instances.
In some cases, the method 400 may be broadened to consider platform configurations beyond large and small instance types. For example, in some cases, the second job sub-list may be re-ranked according to another rank value, such as Rankm&l(J). Then, the second job sub-list may be further split into two job sub-sub lists using a second split point, where a simulator for type medium VM instance executes the first job sub-sub list and a simulator for type large VM instances executes the second sub-sub lists. Each iteration of the method 400 may then increment the second split point, and the first split point increments once the second split point has iterated the second job sub-list.
As discussed above, for example, with reference to simulators 420, 422 of
At operation 502, a scheduler can produce a schedule (that includes an order of execution of jobs and respective tasks) that reduces (or minimizes) an overall completion time of a job sub-list. In some examples, a Johnson scheduling technique for identifying a schedule of concurrent jobs can be used. In general, the Johnson scheduling technique may provide a decision rule to determine an ordering of tasks that involve multiple processing stages. In other implementations, other techniques for determining a schedule of jobs can be employed. For example, the determination of an improved schedule can be accomplished using a brute-force technique, where multiple orders of jobs are considered and the order with the best or better execution time (smallest or smaller execution time) can be selected as the optimal or improved schedule.
At operation 504, the simulator may simulate the job sub-list. As
The result of the simulation executed at operation 504 may form a homogeneous simulation result 510. A homogeneous simulation result may include, among other things: (NumNodes, Cost), where NumNodes specifies the duster size (number of computing nodes in a duster), and Cost represents the cost to the tenant to execute the job sub-list on a cluster of the specified cluster size. The cost can be based on the price charged to a tenant for the respective platform configuration for a given amount of time.
In some cases, operations 502, 504 shown in
After the search space has been built, the simulator may, at operation 512, select a homogeneous simulation result from the generated homogeneous simulation results. In some examples, the simulator can select a homogeneous simulation result based on selecting the homogeneous simulation result with the lowest cost in the set of homogeneous simulation results.
The foregoing further describes determining a schedule of jobs of a workload, according to some implementations, which was introduced above with reference to operation 502 of
The following considers an example execution of two (independent) MapReduce jobs J1 and J2 in a cluster, in which no data dependencies exist between the jobs. As shown in
A first execution order of the jobs may lead to a less efficient resource usage and an increased processing time as compared to a second execution of the jobs. To illustrate this, consider an example workload that includes the following two jobs:
There are two possible execution orders for jobs J1 and J2 shown in
More generally, there can be a substantial difference in the job completion time depending on the execution order of the jobs of a workload. A workload ={J1, J2, . . . , Jn}, includes a set of n MapReduce jobs with no data dependencies between them. The scheduler generates an order (a schedule) of execution of jobs Ji∈ such that the makespan of the workload is minimized. For minimizing the makespan of the workload of jobs ={J1, J2, . . . , Jn}, the Johnson scheduling technique can be used.
Each job J in the workload of n jobs can be represented by the pair (mi, ri) of map and reduce stage durations, respectively. The values of mi and ri can be estimated using lower and upper bounds, as discussed above, in some examples. Each job Ji=(mi, ri) can be augmented with an attribute Di that is defined as follows:
The first argument in Di is referred to as the stage duration and denoted as Di1. The second argument in Di is referred to as the stage type (map or reduce) and denoted as Di2. In the above, (mi, m), mi represents the duration of the map stage, and m denotes that the type of the stage is a map stage. Similarly, in (ri, r), ri represents the duration of the reduce stage, and r denotes that the type of the stage is a reduce stage.
An example pseudocode of the Johnson scheduling technique is provided below.
The Johnson scheduling technique (as performed by the scheduler) depicted above is discussed in connection with
Line 1 of the pseudocode sorts the n jobs of the set in the ordered list L in such a way that job Ji precedes job Ji+1 in the ordered list L if and only if min(mi, ri)≤min(mi+1, ri+1). In other words, the jobs are sorted using the stage duration attribute Dl1 in Di (stage duration attribute Dl1 represents the smallest duration of the two stages).
The pseudocode takes jobs from the ordered list L and places them into the schedule σ (represented by the scheduling queue 802) from the two ends (head and tail), and then proceeds to place further jobs from the ordered list L in the intermediate positions of the scheduling queue 802. As specified at lines 4-6 of the pseudocode, if the stage type Dl2 in Di is m, i.e., Di2 represents the map stage type, then job Ji is placed at the current available head of the scheduling queue 802 (as represented by head, which is initiated to the value 1. Once job Ji is placed in the scheduling queue 802, the value of head is incremented by 1 (so that a next job would be placed at the next head position of the scheduling queue 802).
As specified at lines 7-9 of the pseudocode, if the stage type Di2 in Di is not m, then job Ji is placed at the current available tail of the scheduling queue 802 (as represented by tail, which is initiated to the value n. Once job Ji is placed in the scheduling queue 802, the value of tail is incremented by 1 (so that a next job would be placed at the next tail position of the scheduling queue 802).
The processor 910 may be a central processing unit (CPU), a semiconductor-based microprocessor, a graphics processing unit (GPU), other hardware devices or circuitry suitable for retrieval and execution of Instructions stored in computer-readable storage device 920, or combinations thereof. For example, the processor 910 may include multiple cores on a chip, include multiple cores across multiple chips, multiple cores across multiple devices, or combinations thereof. The processor 910 may fetch, decode, and execute instructions to implement methods and operations discussed above, with reference to
As an alternative or in addition to retrieving and executing instructions, processor 910 may include at least one integrated circuit (“IC”), other control logic, other electronic circuits, or combinations thereof that include a number of electronic components for performing the functionality of instructions 922.
Computer-readable storage device 920 may be any electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions. Thus, computer-readable storage device may be, for example, Random Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage drive, a Compact Disc Read Only Memory (CD-ROM), non-volatile memory, and the like. As such, the machine-readable storage device can be non-transitory. As described in detail herein, computer-readable storage device 920 may store heterogeneous platform configuration selection instructions 922 for selecting a heterogeneous platform configuration. The instructions 922 may cause the processor 910 to receive a job list specifying jobs of a prospective workload. The instructions may also cause the processor to split the job list into a first job sub-list and a second job sub-list. A first homogeneous simulation result may then be generated based on a simulation of the first job sub-list using a first homogeneous platform configuration. A second homogeneous simulation result may also be generated based on a simulation of the second job sub-list using a second homogeneous platform configuration. The instructions may then cause the processor to generate a heterogeneous simulation result based on the first homogeneous platform configuration and the second homogeneous platform configuration. The processor may then, based on executing the instructions 922, provision resources on a cloud infrastructure according to the heterogeneous simulation result. As discussed above, additional or alternative instructions may be stored by the computer-readable storage device 920 that may cause the processor to execute any of the operations discussed above.
As used herein, the term “computer system” may refer to one or more computer devices, such as the computer device 900 shown in
While this disclosure makes reference to some examples, various modifications to the described examples may be made without departing from the scope of the claimed features.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2015/012713 | 1/23/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2016/118159 | 7/28/2016 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8001083 | Offer | Aug 2011 | B1 |
8171141 | Offer | May 2012 | B1 |
9071553 | Ravichandran | Jun 2015 | B2 |
9094292 | Tung | Jul 2015 | B2 |
20070288224 | Sundarrajan | Dec 2007 | A1 |
20100169072 | Zaki | Jul 2010 | A1 |
20110055712 | Tung | Mar 2011 | A1 |
20110179142 | Ravichandran | Jul 2011 | A1 |
20120084789 | Iorio | Apr 2012 | A1 |
20130055243 | Dandekar et al. | Feb 2013 | A1 |
20130191528 | Heninger et al. | Jul 2013 | A1 |
20130227558 | Du | Aug 2013 | A1 |
20130254196 | Babu et al. | Sep 2013 | A1 |
20130339972 | Zhang | Dec 2013 | A1 |
20140019987 | Verma et al. | Jan 2014 | A1 |
20140047272 | Breternitz et al. | Feb 2014 | A1 |
20140068053 | Ravi | Mar 2014 | A1 |
20140215487 | Cherkasova | Jul 2014 | A1 |
20140359126 | Breternitz et al. | Dec 2014 | A1 |
Number | Date | Country |
---|---|---|
WO-2014052843 | Apr 2014 | WO |
Entry |
---|
Herodotos Herodotou, et al, No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics, Duke University, Oct. 27-28, 2011, 14 Pgs. |
International Searching Authority, The International Search Report and the Written Opinion, PCT/US2015/012713, dated Jul. 6, 2015, 10 Pgs. |
Zacharia Fadika et al, MARLA: MapReduce for Heterogeneous Clusters, Computer Science Department, Binghamton University, Apr. 16, 2012, 8 Pgs. |
Zaharia et al, Improving Mapreduce Performance in Heterogeneous Environments, University of California, Berkeley, Nov. 3, 2008, 14 Pgs. |
Number | Date | Country | |
---|---|---|---|
20170228676 A1 | Aug 2017 | US |