Energy efficient scheduling for computing systems and method therefor

Description

FIELD OF THE INVENTION

The present invention relates to the field of server energy and cooling management.

BACKGROUND OF THE INVENTION

The data center energy crisis has been in the making for the past several decades, as data centers are designed primarily with peak performance and peak capacity in mind. With the doubling of transistor counts and performance in semiconductor devices at 18-month intervals following Moore's law, energy dissipation in servers have grown at an alarming rate. The smaller form factors of modern blade servers have, at the same time, permitted more and more servers to be packed into a given physical space, further worsening the already critical situation with server power dissipations within data centers. Adding to all of this is the trend to overprovision data center capacities and the use of overrated power supplies for the individual servers. Such over provisioning results in gross energy inefficiencies as servers and power supplies are generally designed to give very high energy efficiencies only at or near peak loading levels. The net result of all of these is that 50% and upwards of the total cost of ownership (TCO) for a data center is in the utility costs of operating and cooling the servers. From an economic standpoint, we spend about 2% of the nation's annual energy consumption on data centers. With electricity costs growing annually at about 7% to 10%, the situation is bleak and needs immediate correction with the use of innovative and dramatic solutions. The other benefits of operating energy-efficient data centers are of no less significance—reducing the carbon footprint and making the nation energy-secure are also worthy goals.

Traditional approaches to managing the data center energy crisis have been to use advanced cooling and packaging solutions, to use DC power sources for servers and a variety of other solutions at reducing the energy dissipation within servers. These latter solutions have included the use of dynamically changing the power-performance settings for individual server components, such as processors and hard disk drives, or on policy-based job scheduling that schedule the offered workload across servers to meet thermal objectives. The growing use of virtualization technologies in data center also supports flexible scheduling-based energy management schemes. Virtually, all of these solutions are reactive in nature: energy management or cooling solutions are adjusted based on the feedback from sensors that sense temperature or some activity parameter (such as current computing load, performance metrics).

SUMMARY OF THE INVENTION

The present technology assumes, according to one embodiment, a holistic view of data centers as a cyberphysical system where the cooling solutions work in unison with the computing level solutions for energy management in a coordinated fashion. The total energy expended in the computing components and the energy expended in the cooling system is treated as a first class resource that needs to be scheduled explicitly to maximize the overall energy-efficiency of the data center. One embodiment of aspects of the technology is multi-tiered and includes:

- The use of fast models for predicting local and global thermal conditions to promote overall energy efficiency. The thermal models, in turn, are driven by empirical models of energy dissipation within servers and switches as a function of the measured values of a variety of actual activity counts. This approach of jointly using accurate energy dissipation models for the computing equipment and fast thermal models permit the cooling solutions (adjustment of inlet temperature, air flow speed and pattern) to be proactive.
- The use of a global scheduler to allocate individual energy budgets to servers as a function of the workload, the predicted thermal trend, actual server utilizations and temperature and airflow measurements from sensors. The cooling efforts are also matched to the predicted thermal trends and are rack specific, instead of being rack agnostic, as in traditional systems. Alternatively stated, the cooling efforts for a rack are directed, dynamic and matched to the thermal conditions in the rack's environment. This results in the most energy-efficient use of the cooling resources.
- The use of modified server operating system kernels that permit the individual servers to stay within their assigned energy consumption budget. Software solutions at the operating system kernel level exercise existing power management actuators inside the processor and other components of servers in a proactive fashion to stay within the dictated energy budget and in a reactive fashion based on the thermal condition of its environment. Thus, the system uses a predictive model of the thermal conditions based on analysis of a set of “tasks” or other prospective activities, as well as a feedback driven control which employs sensors or indicia or actual conditions. The predictive model may be adaptive, that is, the predictive model may be modified in dependence on the actual outcomes as determined by the sensors or indicia. In addition to the sensor or indicia inputs, the system may also receive a price or cost input, which permits a price or cost optimization, rather than an efficiency optimization. By imposing an external price or cost consideration, the system can be made responsive to peak energy demand considerations, and also a prioritization of tasks, which may each be associated with a task value.

Each of these technologies may be employed together, separately, or in subcombination. The thermal models, for example, can be implemented with minor modification to semiconductor devices, to provide software access to registers and counters which monitor operation of the chip. As the chip processes information, various types of activities are tracked, and these tracked activities may then be read by software to implement the models. The models may be executed on the same semiconductor as an additional process within a multitasking processing stream, within a special core dedicated to this process, either on or off the integrated circuit, or by a remote system. The modified server operating system kernels typically do not require hardware modifications, though sensors may be required beyond those present in standard components of the computing system. In particular, integration and interfacing of external cooling system sensors and controls may require additional hardware modules. The global scheduler is typically provided as part of a load distribution switch, which is a standard hardware component, but executes software in accordance with the present embodiments. In particular, the task allocation algorithm favors loading of servers to near capacity, which may be defined by performance limitations or thermal limitations, before allocating tasks to other servers. The allocation may distinguish between different blades within a rack, with each rack typically being controlled on a thermal basis, i.e., to stay within a desired thermal envelope while achieving cost-efficient cooling, while each blade may be allocated tasks which balance performance and energy efficiency, while remaining within safe thermal limits.

The net result of a combination of all of this is a control system that uses a combination of proactive and reactive elements in a multi-tiered strategy for co-managing the thermal and computing solutions for promoting the energy efficiency (or cost effectiveness) of the data center. However, these technologies need not be employed together to gain benefits. Likewise, the chip, operating system (software), and system level optimizers need not communicate with each other, though they are preferably aware of the multilevel optimizations, which may alter responses to conditions. For example, a prediction of and control over future processing load must be coordinated between the various system levels in order to avoid conflicting efforts or over-compensation.

A preferred embodiment may be implemented in a scaled down data center consisting of Linux server racks with floor plenum and portable computer room air conditioners (CRACs) and a variety of sensors, or a full data center with server racks in a facility with centrally or distributed control cooling system. Preliminary results indicate that the present approach can realize about a 20% improvement in the energy efficiency of the data center.

With the growing concern about energy expenditures in server installations, mechanisms are sought for improving the energy-efficiency of server installations. Jobs running on a typical data or compute server are quite diverse in their energy-performance requirements. The present technology provides a system and method for dynamically configuring a server system and its associated job scheduling strategy to elevate the energy-efficiency of the server system.

Typical servers have a set of “gears”, i.e., performance settings which optimize performance in one or more ways, with “cost”, such as energy cost, thermal rise, or other metric. These settings are available to permit an administrator to choose the most energy-efficient setting for realizing a specific performance goal to match a specific type of application running on the server. The settings can control the processor, the dynamic random access memory (RAM), the network interface and associated software and the secondary storage system—individually or in combination. For example, a system administrator may program clock rate of a computer, select RAM performance based on wait states, and implement other performance tweaks. In general, higher performance settings have a higher energy cost, both in the unloaded state and the fully loaded state.

In systems having alterable performance settings, one limiting factor is often maximum system component temperature, which in turn is impacted by heat dissipation. Therefore, by establishing performance settings which are near those minimally necessary to achieve suitable results, the system may be able to achieve higher throughput, since the power budget is better distributed to those aspects which are in most need, and not wasted on excess component performance which does not add to task-normalized system performance.

A technique is provided for learning the energy-performance characteristics of jobs or phases of jobs as they execute within a computing system. This information is used to optimize the scheduling of new jobs as well as repetitive jobs with the server characteristics set dynamically to meet the performance demands using the most energy-efficient configuration for realizing the performance goal. The technique also optimizes the energy and performance overhead for dynamically changing the energy-performance setting of the server.

A mechanism for classifying entire jobs or phases of individual jobs into different classes, where each class is associated with a unique performance gear setting is provided. The job or job phases are classified into at least the following categories: compute bound, memory bound, network bound, and secondary storage bound. This classification technique relies on statistics maintained in hardware and the operating system on a typical server. This classification is performed dynamically as the job executes. A single performance setting or a set of performance settings are associated with each such class. Job identities and their classifications may be maintained in a tabular form within the server system. Alternately, a process control block or other data structure may be employed.

Jobs submitted to servers are often very repetitive in nature. When such jobs are encountered, their class can be looked up from the aforesaid table. In one embodiment, queued jobs belonging to the same class are scheduled back-to-back using a performance setting appropriate for the class. Such a scheduling strategy minimizes the energy overhead and performance penalty that would normally arise from changing the gear settings from a job to the next. The same scheduling strategy can be applied to individual phases of a job: the scheduler schedules job phases of different jobs back-to-back, where the phases belong to the same class. A further refinement of this technique will be to have each class divided into subclasses, with a unique performance setting for each subclass. Unclassified jobs or job phases are scheduled by making some default assumptions about the job class; this can change as the job executes—based on its observed behavior.

In an alternate embodiment, complementary tasks are allocated to a single set of resources for execution, such that the available capacity of the server under the specified performance settings is efficiently utilized. For example, one task may incur significant loads on processor and memory, while another on hard disk. By combining these tasks for concurrent execution, each proceeds with high efficiency.

However, it is noted that by tuning the performance of the system to a single task or type of task, in some cases a higher energy efficiency is achieved than in heterogeneous processing environments, even where the task allocations are balanced.

One embodiment of this technology provides that the task allocation is optimized by a software application running on the hardware whose energy efficiency is to be optimized while maintaining acceptable performance, for example an operating system, and that the scope of the optimization is the hardware under control of the software infrastructure.

This technique thus has the potential for improving the energy-efficiency of server systems through relatively modest changes in the operating system kernel. No reliance particular is made on unique hardware or dedicated hardware support.

In order to calibrate the system, a test suite of programs may be provided which exercise different performance aspects of the system. One or more sensors may be provided to monitor the results, which is for example a temperature. In some cases, system components have integral temperature sensors, such as processors, hard drives, memory modules, mother boards, fans, cases/enclosures, and other hardware components, and the operating system may read these sensors. In other cases, sensors are added to the system. Thus, the system may be calibrated for thermal output/performance under different task loading conditions. For example, a multidimensional table or other data structure may be created from this calibration data, and this may be used as a lookup table during operation, or the table processed to generate a compact algorithm which predicts thermal performance from task characteristics. When the system is running arbitrary tasks, the operating system may initially seek to classify the task based on certain characteristics. The characteristics are then used to query the lookup table, or exercise the compact algorithm, which results in an initial allocation of the task and specification of performance settings, or in the case of a system encompassing multiple variants with different performance criteria, an allocation of the task to the selected system. While the live task is running, the predicted performance may be compared with the actual performance, and the system corrected. In general, the correction will be applied to the code module or a signature or key features of the code module being executed, since the table itself is established based on calibration data. Of course, a system may also avoid a calibration phase, and construct a table or algorithm exclusively from live data.

In order to optimize the system performance, a genetic algorithm or other self-tuning algorithm may be implemented, in which during operation, or between successive operations, parameters are changed to “search” for a more efficient operating point, or otherwise to test the effect of various changes in performance and efficiency. Over time, the control system, which is, e.g., the operating system of the computing system which controls task initiation and allocation, will learn, at any existing operating point of the system, how an incremental task of a characterized type is most efficiently performed. A variety of options may be available, including waiting until a future contingency is satisfied, distributing the task to the resources that are predicted to permit completion soonest (i.e., the task runs with highest urgency), distributing the task to existing resource sets with particular performance profiles, and modifying the performance profile of a set of resources on which to execute the task.

As discussed above, a penalty may be incurred in altering the performance profile of a set of resources. Further, a set of resources may be used to process a plurality of tasks, having a range of optimal hardware performance characteristics. Therefore adjusting the performance profile ad hoc to simply optimize for the new task is disfavored, unless the performance profile is suboptimal for a predicted significant set of task, such that the alteration in performance characteristics produces a net benefit.

If the system as a whole is thermally limited, then there is a net benefit to executing tasks in an power dissipation efficient manner, with the entire energy consumption of a task considered as a unit. Often, one or more components of the system will also be thermally limited, e.g., the processor core(s). Therefore, in addition to allocating tasks in an energy efficient manner, tasks must be allocated which do not exceed a component thermal limit, and this may entail allocating some portion of tasks to the system which are relatively inefficient on a system basis, but produce less stress on a limiting component.

It is noted that the tasks need not be characterized by the operating system at run time. Rather, the compiler or another system may produce an in-stream communication, or parameter file associated with the executable code, that defines a type and target performance environment of execution. The operating system can then seek to allocate the task to hardware with the target performance profile configuration. In general, the reference optimum platform may be different than the actual optimum. For example, the actual platform may have extraneous tasks executing, have a different thermal dissipation profile or different component selection, or otherwise deviate from the reference design. Therefore, empirical data is preferably employed, even if the program code is pre-characterized.

The parameter space for the system, outside of the code itself, includes for example, the ambient temperature (facility, external), energy cost (which may vary diurnally), peak demand issues, cooling system efficiency (air conditioning or water cooling), air flow rate, air flow dynamics, average system temperature, peak component temperatures, thermal impedance, fan speed(s), etc. In fact, there may be sufficient factors that explicitly compensating for them may be difficult or impossible, and indeed the system may include intrinsic responses that may defeat external controls or presumptions of a static system. For example, many components of modern computing systems have internal thermal sensors, and firmware that is thermally responsive. Therefore, the response to changes in temperatures, especially near peak loads or peak temperatures, may be non-linear and somewhat unpredictable.

The operating system preferably seeks to profile the code or module to be executed, and allocate the code or module to a processing system having one of a predefined class of performance settings, e.g., four different classes, e.g., processing centric, memory centric, disk centric and network centric. Additional tasks are typically allocated in accordance with their classification, but if and when the tasks on a system deviate from the nominal, the task scheduler may allocate compensating tasks seeking to renormalize the mix of tasks. In this way, a multiprocessing system can gain efficiency, since each system operates near its peak efficiency, which is higher than the efficiency of a general purpose system executing random code segments. If the mix of code changes, in some cases the system may be modified to assume a different performance profile.

It is therefore an object to provide a method for controlling a computing system, comprising: reading a stored energy-performance characteristic of a plurality of different phases of execution of software, an execution of each phase being associated with a consumption of a variable amount of energy in dependence on at least a processing system performance state, the performance state being defined by a selectable performance-energy consumption optimization for at least two processing system components; scheduling a plurality of phases of execution of the software, in dependence on the stored energy-performance characteristics, for each of the respective phases of execution of the software and at least one system-level energy criterion; and executing the phases of execution of the software in accordance with the scheduling.

It is also an object to provide a computing system, comprising: at least two processing components, each having a selectable performance-energy consumption optimization; a memory configured to store energy-performance characteristics of a plurality of different phases of execution of software, an execution of each phase being associated with a consumption of a variable amount of energy in the computing system in dependence on at least a processing system performance state, the performance state being defined by the selectable performance-energy consumption optimization for at least two processing system components; and a scheduler, configured to schedule a plurality of phases of execution of the software on the at least two processing components, in dependence on the stored energy-performance characteristics, for each of the respective phases of execution of the software and at least one system-level energy criterion.

The energy performance-characteristic may be supplied with the software to the processing system. The energy performance-characteristic may also be determined empirically by monitoring an execution of the software phases.

A plurality of software phases executed concurrently, or sequentially.

A plurality of software phases of the same or different software tasks, having similar or complementary energy-performance characteristics, may be scheduled in sequence.

A stored energy-performance characteristic of a software task, comprising a plurality of software phases, may also be read and utilized. The scheduling may attempt to schedule a plurality of software tasks having a similar energy-performance characteristic back-to-back.

It is a further object to provide a computing system, comprising: a memory configured to store energy-performance characteristics of a plurality of different phases of execution of software; a plurality of processing components, each having a selectable performance-energy consumption tradeoff, an execution of each software phase on the plurality of processing components being associated with a consumption of a variable amount of energy in dependence on at least the selectable performance-energy consumption tradeoff; and a scheduler, configured to schedule a plurality of software phases on the plurality of processing components, in dependence on the stored energy-performance characteristics and at least one system-level energy consumption-related criterion.

The scheduler may schedule tasks in dependence on at least a similarity of classification of an energy-performance characteristic of a phase with respect to a prior scheduled phase.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts the control system aspects of the present data center management strategy.

FIG. 2A depicts the state of affairs in prior art servers and shows how the power dissipation and energy efficiency of a typical server varies with server utilization.

FIG. 2B depicts the intended overall impact of the present solution on server power dissipation and server energy efficiency plotted against server utilization.

FIG. 3 shows a block diagram of a prior art computing system.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

According to a prototype embodiment, a scaled down data center is provided which demonstrates a unique approach to addressing the data center energy crisis. The energy spent on the computing equipment and by the cooling system is treated as a first class resource and managed explicitly in the present approach in a proactive as well as reactive manner. Instead of the traditional approach of cooling the server racks uniformly, dynamic and directed cooling is employed, that skews the cooling efforts to match the actual and projected cooling demands of the individual or groups of racks. Cooling for a rack is controlled based on sensors (i.e., a reactive control), a prospective set of tasks or functions in a queue (i.e., a proactive control), and an operating system component of each subsystem which permits a modification of energy demand.

It is noted that a cooling system may have higher efficiency when cooling a relatively hotter server than a cooler one, and therefore overall efficiency may be increased by permitting some server racks to run near a maximum operating temperature, and other racks to be essentially deactivated, pending peak demand recruitment. While running at relatively higher temperatures may be a factor in reducing a mean time between failures (MBTF), the usable life of blades in a data center is typically well in excess of the economic life; further, even if there is a failure, the data center will typically have automatic failover fault tolerance systems. Indeed, if some racks in the data center are specifically designed to always run near peak capacity and high temperature, these may be configured for more efficient operation, for example, greater spacing from other racks, to permit better heat load shedding without affecting adjacent racks, and higher temperature specification components.

It is also noted that in some cases, it is not the temperature per se which adversely impacts the MBTF of a system, but rather the thermal cycling and mechanical stresses on components, circuit boards, and packaging. In such cases, the operation of a rack at a consistent hot temperature may be an advantage over a system which seeks, for example, a uniform minimum temperature of all racks which varies with data center load.

One embodiment of the technology improves the overall energy-efficiency of a data center in a holistic manner, and targets both the energy expended in operating the equipment and the energy expended in the cooling system. A key aspect of is to coordinate the activities of all of the energy consumers in a data center. These consumers include the individual severs and communication infrastructures as well as the cooling system components. Some current solutions to this problem have addressed inefficiencies in the use of power conversion devices, the cooling system and the servers themselves [Sh 09, BH 07, BH 09, LRC+ 08]. Emerging solutions to this problem have also started to address the need to coordinate the activities of these consumers [BH 09, NSSJ 09, SBP+ 05, TGV 08]. As an example, the work of [TGV 08] has proposed an approach for minimizing the energy expended on the cooling equipment by minimizing the inlet temperature through appropriate job scheduling. The work of [NSSJ 09] coordinates the energy expended on the computing equipment and the cooling infrastructures and allocates energy budgets to virtual machines. Such VM energy budgets are not easy to implement, as energy expended by a VM is not easy to track and control; energy dissipation in many related components are ignored in simplifications that are used. In general, emerging solutions have a number of potential limitations:

- The energy and performance overhead associated with job rescheduling and VM management and server-local scheduling overhead are ignored. The communication infrastructures within a data center are heavily utilized and are prone to congestion, resulting in significant added energy dissipation if jobs are rescheduled.
- A simple rescheduling of the jobs may not make the most energy-efficient use of the servers and racks—the operating configurations of such servers have to be continuously adapted to fit the characteristics of the workload.
- Simple reactive control systems, as proposed in all existing and emerging solutions, do not address the problem of thermal lags and delays associated with temperature sensors, whose inputs are used by the actuators in these systems.
- The implicit assumption in most current systems that that all servers and racks have a uniform external cooling requirement may not be the best one for improving overall energy efficiency. While we do have some proportional cooling facilities in the form of automatically adjusted CPU cooling fan and enclosure fan speeds, external cooling systems are generally uniform and oblivious of the specific cooling needs of an entire rack. In general, higher energy efficiency will result by redirecting additional cooling to regions that can benefit from it, resulting in a dynamic, directed cooling system.

The present approach allocates energy budgets to servers, racks, storage and communication components and adapts the cooling effort dynamically to match the energy dissipated in these components. The energy consumption in the computing components are modeled using accurate empirical formulas and server-local (and global) scheduling techniques are used to limit server energy consumption within the allocated budget. This is a far more practical approach compared to any scheme that operates on the basis of energy budget allocations to VMs. The energy dissipation estimates from these empirical models are used to schedule the energy budgets for the computing equipment and the dynamic cooling system, along with the workload. Last but not the least, the present control system uses both proactive and reactive control mechanisms to manage the data center effectively in the face of sudden workload variations and to mitigate latencies associated with the activation and deactivation of servers and VMs.

In current data centers, the software systems infrastructures (including the Linux OS and popular file systems) are very limited in their adaptation capabilities in this respect. The most popular mechanism used for adaption is dynamic voltage and frequency scaling (DVFS) on the processing cores, and other components of the computing platform are unaddressed. This is not a desirable situation from the standpoint of energy efficiency, as the total of the energy dissipations within the DRAM modules and in the backplane and other communication infrastructures is about 45% of the total energy expended by a server, while the processors consume about 30% of the total energy [BH 09]. Current measurements seem to indicate that the processor energy dissipation will continue to decrease relative to the energy dissipation within the other components of a server [BH 09]. At the server level, it is thus critical to incorporate mechanisms that address the energy dissipation across all major components of a server instead of just focusing on the processing cores.

At the data center level, the energy expended in the communication infrastructures (switches, in particular) and in the cooling system itself should be considered. The present approach considers the total energy expended in the computing, storage, communications and cooling system as an explicitly scheduled resource and to schedule the computing and cooling resources using a common framework. The end goal is to maximize the energy efficiency of the data center, consistent with the performance goals. As discussed above, a cost optimization paradigm may also be implemented. In a cost optimization, the costs and benefits are normalized, and a set of conditions with a maximum net benefit is selected. The costs in this case may be energy costs, though other costs can also be considered in the calculation, such as maintenance costs, operating costs, license fees, etc. The benefits are typically considered as the net work output of the system, e.g., computing results, though values may be placed on the speed, latency, accuracy and completeness, etc. of the result. Indeed, assuming the same computational task, the result may be worth more to some users than others. Thus, the energy efficiency considerations may be modified or distorted based on a variety of extrinsic factors. The cost optimization factors may be analyzed in a centralized controller, which permits an allocation of tasks at a scheduler or load balancer element, distributed to the various processing cores and made part of the modified operating system kernel, or a hybrid approach. Of course, other elements may also provide these functions.

Example Use: Integrated, Dynamic Management of Computing and Cooling Resources

The system preferably makes the best use of the energy expended in operating the computing and communication equipment as well as the energy expended in the cooling system. The energy expended by the computing and communication equipment and the cooling system is considered a first class resource and managed explicitly. Servers are allocated individual energy budgets and a modified Linux kernel in the servers is used to dynamically adjust the system settings and perform a local scheduling to stay within the individual server's energy budget allocation. The computation of the energy budgets for servers/racks and the control of the cooling system to effectively define a thermal envelope (that is, cap) for each server/rack for is done by a global module that senses a variety of conditions, as described later, to direct global job scheduling and to control the cooling system components, skewing the cooling effort across racks and regions as needed to improve the overall efficiency of the cooling system.

Another distinguishing feature of a preferred embodiment of the system is in its use of three controls for adapting a cooling system: the air flow rate directed at the racks from portable CRACs, the inlet temperature and the use of movable baffles to redirect air flow. Traditional solutions have largely looked at one or two of these adaptation techniques (mostly inlet temperature and somewhat rarely, air flow rate).

Using the terminology of [RRT+ 08], the integrated data center management technique is essentially a control system with the following components critical to the management:

- Sensors: On the thermal/mechanical side, the sensors monitor the temperature and air flow rates in various parts of the rack and the room. On the computing side, the sensors are in the form of hardware instrumentation counters within the processing cores, counters for device and system utilizations maintained by the operating systems, variables that record the incoming queue size and others.
- Actuators: Our management policy exercises various actuators to adapt the cooling system and the servers. On the thermal/mechanical side, the actuators adjust fan rates for regulating the air flow from CRACs, operate servo motors to adjust the baffles for air flow direction and use electromechanical subsystems to adjust the inlet temperature. On the computing side, the software elements used as actuators (a) control the voltage and frequency settings of the cores and activate/deactivate individual cores to ensure that they do not exceed their allocated energy budget and to respond to thermal emergencies at the board/component level; (b) schedule ready processes assigned to a server and adjust core settings (using (a)) to maximize the energy efficiency of the server; (c) perform global task scheduling and virtual machine activation, migration and deactivation based on the dynamically computed thermal envelopes and rack/server level energy budgets.
- Controllers: The control policy itself will be comprised of two parts; the proactive and reactive, which are described in detail below.

FIG. 1 depicts the control system aspects of one embodiment of a data center management strategy. This control system uses a combination of proactive and reactive strategies:

Proactive strategies: two different types of dynamic proactive management of data centers are provided. These are:

- 1. Because of thermal lags, temperature sensors are unable to detect the onset of thermal emergencies due to sudden bursty activities with the server components, including those within the DRAM, cores, local (swap) disks, if any, and the network interfaces. Empirical power models for the server energy dissipation are therefore derived, using activity counters maintained within the Operating System and the built-in hardware instrumentation counters, as described below. The estimate of the energy dissipation of an individual server is based on sampled estimations of the activities (similar to that described in [PKG 01]). This estimate of the energy dissipated by a server within a sampling interval is used to guide local scheduling and control the local system settings. The estimates of the server energy dissipations within a rack are also used as the inputs to a fast, optimized and calibrated thermal model that provides data on the thermal trends, taking into account the environmental conditions. The computed thermal trends are used, in turn, to guide global and rack level job scheduling and VM management as well as to proactively direct cooling efforts towards a region of rising temperature/hot spot.
- 2. The front end queues of the switches used for load balancing are a good indicator of the offered computing load to a server. These queues are therefore monitored to proactively schedule new jobs in a manner that improves the overall energy efficiency of the data center. This proactive monitoring of the input queue also permits absorption of some of the latencies involved in activating racks and servers that are in a standby mode, as well as to absorb some of the latencies in VM migration. In fact, as described below, the proactive monitoring of the incoming queues of the load balancing switches also permits activation/deactivation and migration of VMs, taking into account the energy overhead of such management.

Reactive Strategies: The reactive strategies include the following sub strategies:

- 1. A management system to ensure that the energy consumption of the individual servers does not exceed their dictated energy budget. This subsystem controls the computing components as well as the network interface. This management system is part of the modified Linux kernel of the servers that uses a server power estimation model and the sampled value of the instrumentation counters found in modern microprocessors and other statistics maintained by the kernel to control system settings (including the DVFS settings).
- 2. A subsystem within the kernel that reacts to local and neighborhood thermal emergencies or trends, as detected from local/neighborhood temperature sensors as well as information generated by the fast thermal models/analyzer to either shut down individual servers/racks or to reconfigure server settings to reduce their energy dissipation. This subsystem is an added protection mechanism that works in conjunction with the other energy management systems—both reactive and proactive, and deals with high-impact unexpected emergencies such as CRAC failures.
- 3. In conjunction with (2) above, a subsystem that monitors the local/neighborhood thermal trends to allocate and direct local cooling capacity in a focused fashion to minimize energy consumption from the cooling system. This will operate on a slower time scale than the computing reactive strategies. The computing approach of (2) above and this thermal systems approach should operate synergistically to minimize the overall global energy usage while maintaining compute performance. The reactive controller will constantly tradeoff energy minimization between the computing and thermal systems activities.
- 4. A subsystem within the global budgeting module that reacts to global thermal emergencies based on sensed environmental conditions in the room and trend data computed by the fast thermal model/analyzer.
- 5. A subsystem within the global budgeting module that reacts to the data on actual server/rack utilizations to throttle down servers/racks as needed.

The overall goal of all of the control system elements, both proactive and reactive, is to maximize the overall system performance under the energy constraints dictated by the budgeting module. The budgeting module ensures that the relative components of the energy dissipated by the computing/communication elements and the cooling system are optimal.

Server Management

The goal of our proposed effort is to improve the overall energy efficiency of the servers and the cooling system. To do this, we attempt to minimize the number of active servers and operate them at or near their peak loading level to maximize their energy efficiency. The existence of virtual machine support certainly makes this approach practical. At the same time, we minimize the energy consumption in the cooling system by just providing sufficient cooling for the active servers. FIG. 2A depicts the state of affairs in current servers and shows how the power dissipation and energy efficiency of a typical server varies with server utilization. As seen in FIG. 2A, the energy-efficiency is quite low at low server loading (utilization) and the power dissipation remains relatively high. FIG. 2A also depicts the typical operating points of servers—the typical average server loading is significantly lower than the peak loading—as a result, the overall energy efficiency is quite low at these typical operating points.

FIG. 2B depicts the intended overall impact of the present technology on server power dissipation and server energy efficiency plotted against server utilization. The present multi-tiered server power management technique (which subsumes standard power management techniques) improves the server energy efficiency dramatically and simultaneously reduces the power dissipation at lower server utilization levels. The overall server efficiency thus remains quite high at the typical load levels and across a wider range of loading, as shown in FIG. 2B. Second, by globally scheduling more work to a fewer number of active servers (and by keeping the non-active servers in a standby state), we push the workload level on individual servers more towards a region where energy-efficiency is very high. The expected result of all of this is a solution that, based on a quick back-of-the-envelope calculation, can enhance the overall energy efficiency of servers by about 15% to 25% on the average beyond what is provided by the state-of-the-art, even when the added overhead of the present solution is factored in. Improvements in power savings are expected to be similar. One down side of operating servers at or near their peak capacity is that any sudden changes in the behavior of their assigned workload can cause switching activities to go up and lead to local thermal emergencies.

In general, servers can be more efficiently managed than presently feasible if they:

- R1) Have mechanisms to put a hard limit on server energy dissipation to avoid thermal emergencies.
- R2) Have a proactive mechanism to activate or deactivate virtual machines or servers or entire racks to match the offered load taking into account any energy and performance overhead for activation and deactivation.
- R3) Have techniques that implement a more energy-proportional relationship between server power dissipation and the server utilization, as shown in FIG. 2B.
- R4) Extend the operating region over which a server has high energy efficiency: this permits higher server energy efficiencies even at moderate load levels.

The implementation of requirements R3 and R4 lead to the situation shown in FIG. 2B. The approach to implementing these requirements in software, on existing systems, is now described.

Implementing the Requirements R1 through R4

Empirical energy dissipation models are preferably used to determine the energy consumed by a server and this estimate is used to cap the energy consumed by a server. This approach is adopted since it is not practical to use external power meters on each server to determine their energy consumption.

Empirical models for the energy dissipated by a server have been proposed in the past; the simplest of these models are based on the use of utilization data maintained by the operating system (such as core utilization, disk utilization) and is, for example, of the form:

P_server=K₀+K₁×U_proc+K₂×U_mem+K₃×U_diskK₄×U_net

Of course, other, more complex forms, may be employed.

Where the Ks are constants determined empirically and the Us refer to the utilizations of the processor (U_proc), memory (U_mem), the disk(s) (U_disk) and the network (U_net). The operating system maintains and updates these utilization data. As reported in [ERK+ 08], the actual measured power and the power estimated from the above equation are quite close and typically within 10%. A recent effort extends simplistic models of this nature to regression based predictive models that predict server energy consumption on long-running jobs as a function of the core energy dissipation, L2 cache misses and ambient temperature [LGT 08]. The model of [LGT 08] is a good starting point for our efforts. We will extend this model with additional metrics obtained from hardware instrumentation counters found in typical cores as well as slightly modified system calls for network/file I/O to account for energy dissipation within network components to accurately account for remote data access and inter-process communications and I/O activity (which were ignored in the work of [LGT 08]).

To track and predict the energy consumption of servers in software, sampled measurements of the hardware instrumentation counter values and OS-maintained counters for computing utilization will be used, in manner reminiscent of our earlier work of [PKG 01]. The modified thread scheduler in contemporary Linux kernels will use these sampled measurements to guide local scheduling within a server so as to limit the server energy consumption within a sampling period to stay within the limit prescribed by the global energy/workload scheduler. In additional to the traditional DVFS adjustments, the behavior of threads within the sampling periods will be classified as CPU bound, disk bound and network bound and schedule similar threads back-to-back to avoid unnecessary changes in the DVFS settings (and avoiding the energy overhead and relatively long latencies in changing such settings). This in turn addresses Requirements R3 and R4. The modified scheduler will also react to thermal emergencies as detected by external temperature sensors (which are read and recorded periodically by the scheduler itself on scheduling events within the kernel).

Requirement R2 is implemented in the global scheduler, as described below, by keeping track of the workload trends (through monitoring of the incoming request queues at the load balancing switches) and job completion statistics. If the global scheduler sees a growth in the job arrival rate, it activates VMs/servers/racks as needed to cope with the additional workload. The overhead for such activation and deactivation, including the energy costs of moving VM contexts are accounted for in this process, and thus requirement R3 is also addressed.

Techniques for message consolidation that packs several short messages into a single message within a jumbo Ethernet frame within the network interface to amortize the flat component of per-packet overhead of network transfers may also be employed. This also addresses Requirement R3.

A different way of amortizing the scheduling overhead (including the changing of the DVFS settings of cores) exploits the characteristics of repetitive jobs. In a typical server installation, the number of such jobs is expected to be quite high. For example, repetitive jobs of the SPECweb 2006 benchmarks on a Linux platform (with Intel E5460 cores) running Apache were dynamically classified into two classes: compute bound and I/O bound, based on utilization statistics maintained by the kernel and instruction commit rate data maintained in the hardware instrumentation counters. This classification data was maintained within the Apache server. Jobs of the same class in the work queue of Apache were scheduled back-to-back wherever possible and the DVFS settings of the dual core platform were explicitly controlled. Unnecessary changes in the DVFS settings were also avoided and job wait times on the queues were limited to maintain a performance level close to that of the base case. The CPU power measurements (made with a power clamp on the power cord for the core going from the power supply to the motherboard) showed that this simply strategy reduced the core power consumption by about 11%.

For the present system, this technique can be moved to the kernel level for added efficiency, extend the classification to add memory bound jobs (jobs that trigger a high proportion of RAM activity, as evidenced by the on-chip cache miss instrumentation counter) and network bound job classes, for instance. This classification information is used to schedule jobs that match the characteristics of processor sockets with a preset independent performance or to cores within a multicore chip that permits the use of similar preset performance settings independently for each core. The preset performance settings are changed only under load increases that saturate the capacity of a core with a specific DVFS setting. This approach of exploiting pre-classed job addresses requirements R3 and R4 simultaneously.

Global Energy Budget Allocation and Workload Scheduling

The global scheduler (GS) of a preferred embodiment of the system is responsible for the allocation of energy budgets for the VMs/servers/racks and the assignment of workload to the individual machines. The key requirement of the GS is that it has to be fast and scalable. The GS may be implemented on a few dedicated multicore machines which also implement the compact thermal analyzer and models. Multiple machines may be used to permit scalability; for a small server installation, it may be possible to implement all of the functions on a single multicore platform. These dedicated machines may also receive data from a variety of sources, which are optional, as shown in FIG. 1.

The GS maintains a variety of tables that record the energy/performance characteristics of each rack, its utilization statistics, and data on the environmental temperature computed from various sources. The GS also maintains a list of quality of service (QoS) requirements (guaranteed transaction rates, data delivery rates etc.) for implementing differentiated services. The GS also senses the incoming work queue sizes at the load balancing switches and uses simple workload models to predict the impact of incoming workload. The simple workload models can simply classify incoming jobs based on the request types or use more sophisticated information about pre-classified repetitive jobs. The GS schedules the workload to maximize the workload allocated to active servers/racks, assuming VM support on all nodes. This allocation uses the thermal data—obtained from the compact model as well as from thermal sensors and using service guarantees as a constraint. Cooling requirements and changes to the energy budget for the computing/storage and communication equipment for the allocated workload are also assigned based on a variety of heuristics. Some possible heuristics include (but are not limited to):

- Extrapolate the thermal output of the active servers and revise its energy budget and cooling requirement based on the updates to number of jobs (existing plus newly-assigned) assigned to the server.
- Use the energy requirement characteristics of known, repetitive jobs and the heuristic above for unclassified jobs to plan the schedule.
- Use the data maintained on the average job completion rate and average energy requirement of jobs to guide the allocations.

As mentioned earlier, the GS keeps track of the job dispatch rate and the size of the incoming queues in the front-end load balancing switches to keep track of the workload trend. This trend data is used to activate or deactivate servers and racks and redirect cooling efforts as needed. The energy expended in such activation/deactivation and in migrating VMs, where necessary is accounted for in the allocations.

Alternative scheduling may also be employed, including ones that dynamically switch scheduling strategies based on the thermal conditions and current workload. As an example, if all servers are being operated in the high energy-efficiency region as shown in FIG. 2B, then it may be better to perform an allocation that balances the load across the racks to avoid the formation of hot spots in the server room.

The GS has similarities with data center configuration systems and mangers from several vendors (e.g., IBM's Tivoli suite) [IBM 08a, IBM 08b]. However, the present system differs from these schedulers in at least the way server energy dissipation estimates are made at a finer granularity, in making use of a thermal model to predict and cope with thermal conditions, and in using dynamic cooling systems.

Control Systems Issues

The present technique is essentially a control system that employs reactive as well as proactive actuations to meet the goal of improving the overall energy efficiency of a data center. As such, it has to be scalable, stable and provide appropriate sense-and-actuate latencies. Another important requirement of the system is that the various control elements should act in a synchronized and coordinated manner, avoiding “power struggles” [RRT+ 08], where one control loop fights against another inadvertently.

On the control elements at the computing side, these control system requirements are met by a using a hierarchical implementation that uses independent control elements at each level and by using a progressive top-down approach to dictate the energy/performance goals of one level to be explicitly dictated by the control system at the immediately upper level. The hierarchical control mechanisms of the activities within a computing system also ensures its scalability: separate control loops are used to ensure the energy budgets at the rack level and at the level of individual servers within the rack are monitored and managed separately. For large data centers, another level can be added to make the system more scalable, based on the allocation and control of the energy budgets for a set of neighboring racks.

The control of the computing equipment is based on the notion of update intervals within a sampling period, with sensor and model outputs collected at the end of each update period. At the end of a sampling period, the values of respective sensor and model data output are averaged, and control decisions taken at the end of a sampling period based on these average values, as introduced in [PKG 01]. This approach smooths out the impact of burst activities that are inevitable within a sampling interval and enables a stable control system for the computing elements.

Hardware Overview

FIG. 3 (see U.S. 7,702,660, issued to Chan, expressly incorporated herein by reference), shows a block diagram that illustrates a computer system 400 upon which an embodiment of the invention may be implemented. Computer system 400 includes a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information. Computer system 400 also includes a main memory 406, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404. Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404. Computer system 400 further includes a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk or optical disk, is provided and coupled to bus 402 for storing information and instructions.

Computer system 400 may be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT) or liquid crystal flat panel display, for displaying information to a computer user. An input device 414, including alphanumeric and other keys, is coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is cursor control 416, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.

The invention is related to the use of computer system 400 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in main memory 406. Such instructions may be read into main memory 406 from another machine-readable medium, such as storage device 410. Execution of the sequences of instructions contained in main memory 406 causes processor 404 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.

The term “machine-readable medium” as used herein refers to any medium that participates in providing data that causes a machine to operation in a specific fashion. In an embodiment implemented using computer system 400, various machine-readable media are involved, for example, in providing instructions to processor 404 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 410. Volatile media includes dynamic memory, such as main memory 406.

Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 402. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. All such media must be tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into a machine.

Common forms of machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 404 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 400 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 402. Bus 402 carries the data to main memory 406, from which processor 404 retrieves and executes the instructions. The instructions received by main memory 406 may optionally be stored on storage device 410 either before or after execution by processor 404.

Computer system 400 also includes a communication interface 418 coupled to bus 402. Communication interface 418 provides a two-way data communication coupling to a network link 420 that is connected to a local network 422. For example, communication interface 418 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 418 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 418 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 420 typically provides data communication through one or more networks to other data devices. For example, network link 420 may provide a connection through local network 422 to a host computer 424 or to data equipment operated by an Internet Service Provider (ISP) 426. ISP 426 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 428. Local network 422 and Internet 428 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 420 and through communication interface 418, which carry the digital data to and from computer system 400, are exemplary forms of carrier waves transporting the information.

Computer system 400 can send messages and receive data, including program code, through the network(s), network link 420 and communication interface 418. In the Internet example, a server 430 might transmit a requested code for an application program through Internet 428, ISP 426, local network 422 and communication interface 418.

The received code may be executed by processor 404 as it is received, and/or stored in storage device 410, or other non-volatile storage for later execution. In this manner, computer system 400 may obtain application code in the form of a carrier wave.

In this description, several preferred embodiments were discussed. Persons skilled in the art will, undoubtedly, have other ideas as to how the systems and methods described herein may be used. It is understood that this broad invention is not limited to the embodiments discussed herein. Rather, the invention is limited only by the following claims.

References (Each of Which is Expressly Incorporated by Reference)

- U.S. Pat. No. 7,228,441 B2
- [BH 07] Luiz André Barroso and Urs Hölzle, “The Case for Energy-Proportional Computing”, IEEE Computer Magazine, December 2007.
- [BH 09] Luiz André Barroso and Urs Hölzle, “The Datacenter as a Computer :An Introduction to the Design of Warehouse-Scale Machines”, Morgan-Claypool Publishers, 2009 (ISBN No. 9781598295566).
- [ERK+ 08] D. Economou Suzanne Rivoire, Christos Kozyrakis, and Parthasarathy Ranganathan, “Full-system Power Analysis and Modeling for Server Environments”, in Proc. Workshop on Modeling Benchmarking and Simulation (MOBS) at the Int'l. Symposium on Computer Architecture, Boston, MA, June 2006.
- [IBM 08a] IBM Corporation, IBM Tivoli Usage Accounting Manager V7.1 Handbook, IBM Redbook, March 2008.
- [IBM 08b] IBM Corporation, Value Proposition for IBM Systems Director: Challenges of Operational Management for Enterprise Server Installations, IBM ITG Group, Mangement Brief (34 pages), November 2008.
- [Ko 07] Jonathan G. Koomey, “Estimating Total Power Consumption By Servers in the U.S. and the World”, Analytics Press. February 2007. Also available at: enterprise.amd.com/us-en/ AMD-Business/Technology-Home/Power-Management.aspx.
- [LGT 08] Adam Lewis, Soumik Ghosh and N.-F. Tzeng, “Run-time Energy Consumption Estimation Based on Workload in Server Systems”, in Proc. of the HotPower 08 workshop, held in conjunction with the 2008 Usenix OSDI Symposium.
- [LRC+ 08] Kevin Lim, Parthasarathy Ranganathan, Jichuan Chang, Chandrakant Patel, Trevor Mudge, Steven Reinhardt, “Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments”, in Proc. of the 35th International Symposium on Computer Architecture, 2008, pp. 315-326.
- [NSSJ 09] Ripal Nathuji, Ankit Somani, Karsten Schwan, and Yogendra Joshi, “CoolIT: Coordinating Facility and IT Management for Efficient Datacenters”, in Proc. of the HotPower 08 workshop, held in conjunction with the 2008 Usenix OSDI Symposium.
- [PKG 01] Dmitry Ponomarev, Gurhan Kucuk and Kanad Ghose, “Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources”, in Proc. 34th IEEE/ACM International Symposium on Microarchitecture (MICRO-34), December 2001, pp. 90-101.
- [RRT+ 08] Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wnag, and Xiaoyun Zhu, “No Power Struggles: Coordinated Multilevel Power Management for the Data Center”, in Proc. ACM Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008.
- [Sh 09] Stephen Shankland, “Google Uncloaks Once-Secret Server”, CNET News, Business Tech, April, 2009, available at: news.cnet.com/8301-1001_3-10209580-92.html.
- [SBP+ 05] Ratnesh K. Sharma, Cullen Bash, Chandrakant D. Patel, Richard J. Friedrich, Jeffrey S. Chase: Balance of Power: Dynamic Thermal Management for Internet Data Centers. IEEE Internet Computing Vol. 9, No. 1, pp. 42-49, 2005.
- [TGV 08] Qinghui Tang, Member, Sandeep K. S. Gupta, and Georgios Varsamopoulos, “Energy-Efficient, Thermal-Aware Task Scheduling for Homogeneous, High Performance Computing Data Centers: A Cyber-Physical Approach”, in IEEE Trans. On Parallel and Distributed Systems, November 2008 (vol. 19 no. 11) pp. 1458-1472.

Claims

1. A system for managing a datacenter comprising a plurality of racks, each rack having a plurality of servers, each server supporting execution of virtual machines, comprising: a computer system comprising a processor and a storage device;a set of stored tables, which record: rack energy versus performance characteristics,rack utilization statistics, andrack environmental temperature;an ordered queue of tasks to be processed by the virtual machines;a computer implemented model configured to predict an impact of a new task in the ordered queue of tasks on the rack energy versus performance characteristics, rack utilization statistics, and rack environmental temperature, based on the set of stored tables; anda scheduler, configured to optimize a net energy efficiency of the plurality of racks and cooling systems for the plurality of racks while meeting a task processing quality of service constraint for the respective tasks and a thermal constraint, by controlling: a sequence of, and a server employed by, the respective tasks in the ordered queue of tasks to meet the task processing quality of service constraint, andcooling systems for the plurality of racks in dependence on the sequence of tasks in the ordered queue of tasks prior to their execution, to meet predicted cooling needs to meet the thermal constraint.
2. The system according to claim 1, wherein the scheduler is further configured to control an activation state of respective racks of the plurality of racks, to optimize a net energy efficiency of the plurality of racks and the cooling systems for the plurality of racks while meeting the task processing quality of service constraint for the respective tasks and the thermal constraint.
3. The system according to claim 2, wherein the scheduler is further configured to maximize a workload of tasks allocated to respective racks which are controlled to be active.
4. The system according to claim 1, further comprising a memory which stores energy requirement characteristics of respective tasks and task completion rates of respective tasks, wherein the scheduler is configured to employ the energy requirement characteristics of the respective tasks and the task completion rates of the respective tasks to optimize a net energy efficiency of the plurality of racks and the cooling systems by selectively targeting execution of respective tasks on respective servers dependent on the energy requirement characteristics of the respective tasks and the task completion rates of the respective tasks.
5. The system according to claim 1, wherein the scheduler is further configured to produce an output to control a migration of a virtual machine from a first server to a second server of the plurality of racks, wherein the optimization of the net energy efficiency of the plurality of racks and the cooling systems for the plurality of racks is dependent on an energy expended in the migration of the virtual machine.
6. The system according to claim 1, wherein the scheduler is further configured to change an activation state of at least one respective rack of the plurality of racks and to control a migration of a virtual machine from a first server to a second server of the plurality of racks, wherein the optimization of the net energy efficiency of the plurality of racks and the cooling systems for the plurality of racks is dependent on an energy expended in the change of the activation state of the at least one respective rack and the migration of the virtual machine.
7. The system according to claim 1, wherein the scheduler is further configured to redirect cooling from the cooling systems from a first rack to a second rack of the plurality of racks dependent on cooling needs predicted from the ordered queue of tasks.
8. The system according to claim 1, wherein the scheduler has a first mode in which workload is concentrated on a subset of the plurality of racks, and a second mode in which workload is balanced across the plurality of racks.
9. The system according to claim 1, wherein the scheduler is responsive to a latency of the cooling systems between generation of control signals and a cooling response delivered to the plurality of racks.
10. The system according to claim 9, wherein the scheduler is further responsive to a thermal sensor for control of the cooling systems.
11. The system according to claim 10, wherein the scheduler is further responsive to a hardware instrumentation counter within at least one server for control of the cooling systems.
12. The system according to claim 1, wherein the scheduler comprises a hierarchical set of independent control elements at a plurality of levels, wherein energy versus performance goals of a respective level below a highest level are explicitly dictated by an independent control element at a hierarchically higher level.
13. The system according to claim 1, wherein the scheduler is configured to selectively change an activation state of respective racks of the plurality of racks and an operational parameter of the cooling systems at predetermined update intervals.
14. The system according to claim 1, wherein each task in the ordered queue of tasks is labeled as being compute bound or input-output bound, and wherein the scheduler is further configured to: selectively group compute bound tasks for execution on at least one server optimized for compute bound tasks;selectively group input-output bound tasks for execution on at least one server optimized for input-output bound tasks;the at least one server optimized for compute bound tasks having different characteristics from the at least one server optimized for input-output bound tasks;selectively produce a signal for optimization of a performance of a respective server for processing of compute bound tasks or input-output bound tasks by changing the characteristics of the respective server.
15. The system according to claim 1, wherein the scheduler is configured to schedule operation of the cooling systems dependent on the ordered queue of tasks.
16. The system according to claim 1, wherein the scheduler is a hierarchical scheduler comprising a higher level scheduler and a lower level scheduler, each respective server has an energy budget allocation defined by the higher level scheduler, and the lower level scheduler adjusts performance settings of the respective server and performs local scheduling for the respective server to stay within the respective server's energy budget allocation defined by the higher level scheduler.
17. The system according to claim 1, wherein the cooling system comprises three separate controls controlled by the scheduler, a first cooling system control configured to control an air flow rate, a second control configured to control an inlet temperature, and a third control configured to control moveable baffles to redirect air flow.
18. The system according to claim 15, further comprising a memory which stores energy requirement characteristics of respective tasks and task completion rates of respective tasks,wherein the scheduler is further configured to employ the energy requirement characteristics of the respective tasks and the task completion rates of the respective tasks to optimize a net energy efficiency of the plurality of racks and the cooling systems by selectively targeting execution of respective tasks on respective servers.
19. A method for managing a datacenter comprising a plurality of racks, each rack having a plurality of servers, each server supporting execution of virtual machines, comprising: storing, by a computer system comprising a processor and a storage device, a set of stored tables in a memory, which record rack performance characteristics with respect to energy consumption, rack utilization statistics, and rack environmental temperature;providing, by the computer system, an ordered queue of tasks to be processed by the virtual machines;predicting an impact of a new task in the ordered queue of tasks on the rack performance, rack energy consumption, rack utilization, and rack environmental temperature, with a computer implemented model, based on the set of stored tables; andoptimizing, by the computer system, a net energy efficiency of the plurality of racks and cooling systems for the plurality of racks while meeting a task processing quality of service constraint for the tasks and a thermal constraint with a scheduler, by controlling: a sequence of, and a server employed by, the tasks in the ordered queue of tasks to meet the task processing quality of service constraint, andcooling systems for cooling the plurality of racks in dependence on the sequence of tasks in the ordered queue of tasks prior to their execution, to meet predicted cooling needs to meet the thermal constraint.
20. A system for managing a datacenter comprising a plurality of racks, each rack having a plurality of servers, each server supporting execution of virtual machines, comprising: a computer system comprising a processor and a storage device;a set of stored tables, which record actual rack energy/performance characteristics, rack utilization statistics, and rack environmental temperature, wherein the respective servers of the plurality of servers have controllable energy/performance characteristics;an ordered queue of tasks to be processed by the virtual machines;a computer implemented model configured to predict an impact of execution, on a respective virtual machine, of a new task in the ordered queue of tasks on the rack energy/performance characteristics, rack utilization statistics, and rack environmental temperature, based on the set of stored tables; anda scheduler, configured to optimize an energy required for operation of the plurality of racks and cooling systems for the plurality of racks while meeting a task processing quality of service constraint for the tasks and a thermal constraint, by controlling: a sequence of, and a server employed for, execution of the tasks in the ordered queue of tasks to meet the task processing quality of service constraint, andcooling systems for the plurality of racks in dependence on the sequence of tasks in the ordered queue of tasks prior to their execution, to meet predicted cooling needs to meet the thermal constraint.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a Continuation of U.S. patent application Ser. No. 15/490,525, filed Apr. 18, 2017, now pending, which is a Continuation of U.S. patent application Ser. No. 12/841,160, filed Jul. 21, 2010, now abandoned, which claims benefit of priority from U.S. Provisional Patent Application No. 61/227,361, filed Jul. 21, 2009, each of which is hereby expressly incorporated by reference in their entirety.

US Referenced Citations (522)

Number	Name	Date	Kind
4823290	Fasack et al.	Apr 1989	A
4962734	Jorgensen	Oct 1990	A
5080496	Keim	Jan 1992	A
5095712	Narreau	Mar 1992	A
5216623	Barrett et al.	Jun 1993	A
5367670	Ward et al.	Nov 1994	A
5410448	Barker, III et al.	Apr 1995	A
5462225	Massara et al.	Oct 1995	A
5581478	Cruse et al.	Dec 1996	A
5657641	Cunningham et al.	Aug 1997	A
5682949	Ratcliffe et al.	Nov 1997	A
5718628	Nakazato et al.	Feb 1998	A
5735134	Liu et al.	Apr 1998	A
5781787	Shafer et al.	Jul 1998	A
5850539	Cook et al.	Dec 1998	A
5995729	Hirosawa et al.	Nov 1999	A
6055480	Nevo et al.	Apr 2000	A
6078943	Yu	Jun 2000	A
6117180	Dave et al.	Sep 2000	A
6134511	Subbarao	Oct 2000	A
6179489	So et al.	Jan 2001	B1
6189106	Anderson	Feb 2001	B1
6216956	Ehlers et al.	Apr 2001	B1
6223274	Catthoor et al.	Apr 2001	B1
6246969	Sinclair et al.	Jun 2001	B1
6289267	Alexander et al.	Sep 2001	B1
6289488	Dave et al.	Sep 2001	B1
6298370	Tang et al.	Oct 2001	B1
6341347	Joy et al.	Jan 2002	B1
6347627	Frankie et al.	Feb 2002	B1
6351808	Joy et al.	Feb 2002	B1
6374627	Schumacher et al.	Apr 2002	B1
6494050	Spinazzola et al.	Dec 2002	B2
6507862	Joy et al.	Jan 2003	B1
6542991	Joy et al.	Apr 2003	B1
6574104	Patel et al.	Jun 2003	B2
6672955	Charron	Jan 2004	B2
6684298	Dwarkadas et al.	Jan 2004	B1
6694347	Joy et al.	Feb 2004	B2
6694759	Bash et al.	Feb 2004	B1
6714977	Fowler et al.	Mar 2004	B1
6718277	Sharma	Apr 2004	B2
6721672	Spitaels et al.	Apr 2004	B2
6735630	Gelvin et al.	May 2004	B1
6745579	Spinazzola et al.	Jun 2004	B2
6804616	Bodas	Oct 2004	B2
6819563	Chu et al.	Nov 2004	B1
6826607	Gelvin et al.	Nov 2004	B1
6827142	Winkler et al.	Dec 2004	B2
6832251	Gelvin et al.	Dec 2004	B1
6853097	Matsuda et al.	Feb 2005	B2
6859366	Fink	Feb 2005	B2
6859831	Gelvin et al.	Feb 2005	B1
6862179	Beitelmal et al.	Mar 2005	B2
6885920	Yakes et al.	Apr 2005	B2
6964539	Bradley et al.	Nov 2005	B2
6967283	Rasmussen et al.	Nov 2005	B2
6996441	Tobias	Feb 2006	B1
7020586	Snevely	Mar 2006	B2
7020701	Gelvin et al.	Mar 2006	B1
7031870	Sharma et al.	Apr 2006	B2
7051946	Bash et al.	May 2006	B2
7062304	Chauvel et al.	Jun 2006	B2
7085133	Hall	Aug 2006	B2
7089443	Albonesi et al.	Aug 2006	B2
7139999	Bowman-Amuah	Nov 2006	B2
7148796	Joy et al.	Dec 2006	B2
7155318	Sharma et al.	Dec 2006	B2
7155617	Gary et al.	Dec 2006	B2
7174194	Chauvel et al.	Feb 2007	B2
7174468	Gary et al.	Feb 2007	B2
7184866	Squires et al.	Feb 2007	B2
7197433	Patel et al.	Mar 2007	B2
7219067	McMullen et al.	May 2007	B1
7219249	Ghose et al.	May 2007	B1
7228441	Fung	Jun 2007	B2
7251547	Bash et al.	Jul 2007	B2
7251709	Williams	Jul 2007	B2
7305486	Ghose et al.	Dec 2007	B2
7313503	Nakagawa et al.	Dec 2007	B2
7315448	Bash et al.	Jan 2008	B1
7316021	Joy et al.	Jan 2008	B2
7330983	Chaparro et al.	Feb 2008	B2
7365973	Rasmussen et al.	Apr 2008	B2
7366632	Hamann et al.	Apr 2008	B2
7378165	Brignone et al.	May 2008	B2
7401333	Vandeweerd	Jul 2008	B2
7403391	Germagian et al.	Jul 2008	B2
7421601	Bose et al.	Sep 2008	B2
7426453	Patel et al.	Sep 2008	B2
7472043	Low et al.	Dec 2008	B1
7480908	Tene	Jan 2009	B1
7496735	Yourst et al.	Feb 2009	B2
7500001	Tameshige et al.	Mar 2009	B2
7549069	Ishihara et al.	Jun 2009	B2
7558649	Sharma et al.	Jul 2009	B1
7562243	Ghose	Jul 2009	B1
7568360	Bash et al.	Aug 2009	B1
7584475	Lightstone et al.	Sep 2009	B1
7590589	Hoffberg	Sep 2009	B2
7596476	Rasmussen et al.	Sep 2009	B2
7620480	Patel et al.	Nov 2009	B2
7644148	Ranganathan et al.	Jan 2010	B2
7657766	Gonzalez et al.	Feb 2010	B2
7676280	Bash et al.	Mar 2010	B1
7685601	Iwamoto	Mar 2010	B2
7702660	Chan et al.	Apr 2010	B2
7726144	Larson et al.	Jun 2010	B2
7739537	Albonesi et al.	Jun 2010	B2
7757103	Savransky et al.	Jul 2010	B2
7797367	Gelvin et al.	Sep 2010	B1
7799474	Lyon et al.	Sep 2010	B2
7818507	Yamazaki et al.	Oct 2010	B2
7844440	Nasle et al.	Nov 2010	B2
7844687	Gelvin et al.	Nov 2010	B1
7881910	Rasmussen et al.	Feb 2011	B2
7885795	Rasmussen et al.	Feb 2011	B2
7899925	Ghose et al.	Mar 2011	B2
7904905	Cervini	Mar 2011	B2
7908126	Bahel et al.	Mar 2011	B2
7912955	Machiraju et al.	Mar 2011	B1
7975156	Artman et al.	Jul 2011	B2
7979250	Archibald et al.	Jul 2011	B2
7992151	Warrier et al.	Aug 2011	B2
8006111	Faibish et al.	Aug 2011	B1
8015567	Hass	Sep 2011	B2
8020163	Nollet et al.	Sep 2011	B2
8046558	Ghose	Oct 2011	B2
8051310	He et al.	Nov 2011	B2
8099731	Li et al.	Jan 2012	B2
8117367	Conti et al.	Feb 2012	B2
8135851	Pilkington et al.	Mar 2012	B2
8140658	Gelvin et al.	Mar 2012	B1
8155922	Loucks	Apr 2012	B2
8200995	Shiga et al.	Jun 2012	B2
8219362	Shrivastava et al.	Jul 2012	B2
8219993	Johnson et al.	Jul 2012	B2
8228046	Ingemi et al.	Jul 2012	B2
8244502	Hamann et al.	Aug 2012	B2
8245059	Jackson	Aug 2012	B2
8249825	VanGilder et al.	Aug 2012	B2
8260628	Lopez et al.	Sep 2012	B2
8271807	Jackson	Sep 2012	B2
8271813	Jackson	Sep 2012	B2
8276008	Jackson	Sep 2012	B2
8285999	Ghose et al.	Oct 2012	B1
8301315	Dawson et al.	Oct 2012	B2
8302098	Johnson et al.	Oct 2012	B2
8321712	Ghose	Nov 2012	B2
8327158	Titiano et al.	Dec 2012	B2
8344546	Sarti	Jan 2013	B2
8365176	Campbell et al.	Jan 2013	B2
8397088	Ghose	Mar 2013	B1
8417391	Rombouts et al.	Apr 2013	B1
8424006	Jacobson et al.	Apr 2013	B2
8425287	Wexler	Apr 2013	B2
8438364	Venkataramani	May 2013	B2
8447993	Greene et al.	May 2013	B2
8452999	Barth et al.	May 2013	B2
8467906	Michael et al.	Jun 2013	B2
8473265	Hlasny et al.	Jun 2013	B2
8499302	Hass	Jul 2013	B2
8509959	Zhang et al.	Aug 2013	B2
8527747	Hintermeister et al.	Sep 2013	B2
8527997	Bell, Jr. et al.	Sep 2013	B2
8533719	Fedorova et al.	Sep 2013	B2
8549333	Jackson	Oct 2013	B2
8554515	VanGilder et al.	Oct 2013	B2
8560677	VanGilder et al.	Oct 2013	B2
8565931	Marwah et al.	Oct 2013	B2
8566447	Cohen et al.	Oct 2013	B2
8583945	Tran	Nov 2013	B2
8589931	Barsness et al.	Nov 2013	B2
8589932	Bower, III et al.	Nov 2013	B2
8595586	Borthakur et al.	Nov 2013	B2
8600576	Dawson et al.	Dec 2013	B2
8600830	Hoffberg	Dec 2013	B2
8612688	Venkataramani et al.	Dec 2013	B2
8612785	Brown et al.	Dec 2013	B2
8612984	Bell, Jr. et al.	Dec 2013	B2
8631411	Ghose	Jan 2014	B1
8639113	DeCusatis et al.	Jan 2014	B2
8639482	Rasmussen et al.	Jan 2014	B2
8667063	Graham et al.	Mar 2014	B2
8677365	Bash et al.	Mar 2014	B2
8684802	Gross et al.	Apr 2014	B1
8688413	Healey et al.	Apr 2014	B2
8689220	Prabhakar et al.	Apr 2014	B2
8700938	Ghose	Apr 2014	B2
8723362	Park et al.	May 2014	B2
8725307	Healey et al.	May 2014	B2
8736109	Park	May 2014	B2
8751897	Borthakur et al.	Jun 2014	B2
8776069	Prabhakar et al.	Jul 2014	B2
8782434	Ghose	Jul 2014	B1
8782435	Ghose	Jul 2014	B1
8793328	Lindamood et al.	Jul 2014	B2
8793351	Renzin	Jul 2014	B2
8798964	Rosenthal et al.	Aug 2014	B2
8820113	Heydari et al.	Sep 2014	B2
8825219	Gheerardyn et al.	Sep 2014	B2
8825451	VanGilder et al.	Sep 2014	B2
8832111	Venkataramani et al.	Sep 2014	B2
8838281	Rombouts et al.	Sep 2014	B2
8842432	Ehlen	Sep 2014	B2
8862668	Graham et al.	Oct 2014	B2
8867213	Furuta et al.	Oct 2014	B2
8869158	Prabhakar et al.	Oct 2014	B2
8874836	Hayes et al.	Oct 2014	B1
8885335	Magarelli	Nov 2014	B2
8897017	Brashers et al.	Nov 2014	B2
8903876	Michael et al.	Dec 2014	B2
8904189	Ghose	Dec 2014	B1
8904394	Dawson et al.	Dec 2014	B2
8913377	Furuta	Dec 2014	B2
8914155	Shah et al.	Dec 2014	B1
8925339	Kearney et al.	Jan 2015	B2
8930705	Ghose et al.	Jan 2015	B1
8937405	Park	Jan 2015	B2
8949081	Healey	Feb 2015	B2
8949632	Kobayashi et al.	Feb 2015	B2
8954675	Venkataramani et al.	Feb 2015	B2
8972217	VanGilder et al.	Mar 2015	B2
8972570	Moreels et al.	Mar 2015	B1
8991198	Kearney et al.	Mar 2015	B2
8996180	VanGilder et al.	Mar 2015	B2
8996810	Liang	Mar 2015	B2
9015324	Jackson	Apr 2015	B2
9026807	Jackson	May 2015	B2
9027024	Mick et al.	May 2015	B2
9060449	Ehlen	Jun 2015	B2
9063721	Ghose	Jun 2015	B2
9086883	Thomson et al.	Jul 2015	B2
9098351	Bell, Jr. et al.	Aug 2015	B2
9098876	Steven et al.	Aug 2015	B2
9104493	Molkov et al.	Aug 2015	B2
9122525	Barsness et al.	Sep 2015	B2
9122717	Liang	Sep 2015	B2
9122873	Ghose	Sep 2015	B2
9135063	Ghose	Sep 2015	B1
9141155	Wiley	Sep 2015	B2
9144181	Wiley	Sep 2015	B2
9148068	Sarti	Sep 2015	B2
9159042	Steven et al.	Oct 2015	B2
9159108	Steven et al.	Oct 2015	B2
9164566	Ghose	Oct 2015	B2
9171276	Steven et al.	Oct 2015	B2
9173327	Wiley	Oct 2015	B2
9177072	Krishnamurthy et al.	Nov 2015	B2
9178958	Lindamood et al.	Nov 2015	B2
9192077	Iqbal	Nov 2015	B2
9208207	Venkataramani et al.	Dec 2015	B2
9219644	Renzin	Dec 2015	B2
9219657	Dawson et al.	Dec 2015	B2
9223905	Dalgas et al.	Dec 2015	B2
9223967	Ghose	Dec 2015	B2
9230122	Ghose	Jan 2016	B2
9235441	Brech et al.	Jan 2016	B2
9240025	Ward, Jr. et al.	Jan 2016	B1
9250962	Brech et al.	Feb 2016	B2
9264466	Graham et al.	Feb 2016	B2
9274710	Oikarinen et al.	Mar 2016	B1
9277026	Liang et al.	Mar 2016	B2
9286642	Hochberg et al.	Mar 2016	B2
9310786	Imhof et al.	Apr 2016	B2
9322169	Magarelli et al.	Apr 2016	B2
9335747	Steven et al.	May 2016	B2
9338928	Lehman	May 2016	B2
9342376	Jain et al.	May 2016	B2
9342464	Krishnamurthy et al.	May 2016	B2
9344151	Cenizal et al.	May 2016	B2
9354683	Patiejunas et al.	May 2016	B2
9355060	Barber et al.	May 2016	B1
9367052	Steven et al.	Jun 2016	B2
9367825	Steven et al.	Jun 2016	B2
9374309	Shaw et al.	Jun 2016	B2
9377837	Ghose	Jun 2016	B2
9715264	Ghose	Jul 2017	B2
9753465	Ghose	Sep 2017	B1
9762399	Ghose	Sep 2017	B2
9767271	Ghose	Sep 2017	B2
9767284	Ghose	Sep 2017	B2
20010042616	Baer	Nov 2001	A1
20020004842	Ghose et al.	Jan 2002	A1
20020043969	Duncan et al.	Apr 2002	A1
20020053038	Buyuktosunoglu et al.	May 2002	A1
20020059804	Spinazzola et al.	May 2002	A1
20020071031	Lord et al.	Jun 2002	A1
20020072868	Bartone et al.	Jun 2002	A1
20020078122	Joy et al.	Jun 2002	A1
20020133729	Therien et al.	Sep 2002	A1
20020149911	Bishop et al.	Oct 2002	A1
20020174319	Rivers et al.	Nov 2002	A1
20030028582	Kosanovic	Feb 2003	A1
20030061258	Rodgers et al.	Mar 2003	A1
20030084159	Blewett	May 2003	A1
20030097478	King	May 2003	A1
20030115000	Bodas	Jun 2003	A1
20030115024	Snevely	Jun 2003	A1
20030147216	Patel et al.	Aug 2003	A1
20030158718	Nakagawa et al.	Aug 2003	A1
20030177406	Bradley et al.	Sep 2003	A1
20030232598	Aljadeff et al.	Dec 2003	A1
20040006584	Vandeweerd	Jan 2004	A1
20040020224	Bash et al.	Feb 2004	A1
20040065097	Bash et al.	Apr 2004	A1
20040065104	Bash et al.	Apr 2004	A1
20040073324	Pierro et al.	Apr 2004	A1
20040073822	Greco	Apr 2004	A1
20040075984	Bash et al.	Apr 2004	A1
20040078419	Ferrari et al.	Apr 2004	A1
20040089009	Bash et al.	May 2004	A1
20040089011	Patel et al.	May 2004	A1
20040163001	Bodas	Aug 2004	A1
20040189161	Davis et al.	Sep 2004	A1
20040205761	Partanen	Oct 2004	A1
20040221287	Walmsley	Nov 2004	A1
20040240514	Bash et al.	Dec 2004	A1
20040242197	Fontaine	Dec 2004	A1
20040262409	Crippen et al.	Dec 2004	A1
20050023363	Sharma et al.	Feb 2005	A1
20050033889	Hass et al.	Feb 2005	A1
20050063542	Ryu	Mar 2005	A1
20050075839	Rotheroe	Apr 2005	A1
20050102674	Tameshige et al.	May 2005	A1
20050108720	Cervini	May 2005	A1
20050132376	Rodgers et al.	Jun 2005	A1
20050154507	Pierro et al.	Jul 2005	A1
20050225936	Day	Oct 2005	A1
20050228618	Patel et al.	Oct 2005	A1
20050240745	Iyer et al.	Oct 2005	A1
20050267639	Sharma et al.	Dec 2005	A1
20060047808	Sharma et al.	Mar 2006	A1
20060080001	Bash et al.	Apr 2006	A1
20060095911	Uemura et al.	May 2006	A1
20060096306	Okaza et al.	May 2006	A1
20060112261	Yourst et al.	May 2006	A1
20060112286	Whalley et al.	May 2006	A1
20060115586	Xing et al.	Jun 2006	A1
20060121421	Spitaels et al.	Jun 2006	A1
20060139877	Germagian et al.	Jun 2006	A1
20060168975	Malone et al.	Aug 2006	A1
20060179436	Yasue	Aug 2006	A1
20060214014	Bash et al.	Sep 2006	A1
20060259622	Moore	Nov 2006	A1
20070019569	Park et al.	Jan 2007	A1
20070038414	Rasmussen et al.	Feb 2007	A1
20070067595	Ghose	Mar 2007	A1
20070074222	Kunze	Mar 2007	A1
20070078635	Rasmussen et al.	Apr 2007	A1
20070083870	Kanakogi	Apr 2007	A1
20070121295	Campbell et al.	May 2007	A1
20070150215	Spitaels et al.	Jun 2007	A1
20070171613	McMahan et al.	Jul 2007	A1
20070174024	Rasmussen et al.	Jul 2007	A1
20070180117	Matsumoto et al.	Aug 2007	A1
20070190919	Donovan et al.	Aug 2007	A1
20070213000	Day	Sep 2007	A1
20070226741	Seshadri	Sep 2007	A1
20070260417	Starmer et al.	Nov 2007	A1
20070271559	Easton	Nov 2007	A1
20070274035	Fink et al.	Nov 2007	A1
20080041076	Tutunoglu et al.	Feb 2008	A1
20080104604	Li et al.	May 2008	A1
20080104985	Carlsen	May 2008	A1
20080105412	Carlsen et al.	May 2008	A1
20080115140	Erva et al.	May 2008	A1
20080133474	Hsiao et al.	Jun 2008	A1
20080134191	Warrier et al.	Jun 2008	A1
20080162983	Baba et al.	Jul 2008	A1
20080174954	VanGilder et al.	Jul 2008	A1
20080180908	Wexler	Jul 2008	A1
20080186670	Lyon et al.	Aug 2008	A1
20080209243	Ghiasi et al.	Aug 2008	A1
20080216074	Hass et al.	Sep 2008	A1
20080216076	Udell et al.	Sep 2008	A1
20090030554	Bean, Jr. et al.	Jan 2009	A1
20090049447	Parker	Feb 2009	A1
20090064164	Bose	Mar 2009	A1
20090094481	Vera et al.	Apr 2009	A1
20090138313	Morgan	May 2009	A1
20090138888	Shah et al.	May 2009	A1
20090150123	Archibald et al.	Jun 2009	A1
20090150129	Archibald et al.	Jun 2009	A1
20090150700	Dell'Era	Jun 2009	A1
20090150893	Johnson et al.	Jun 2009	A1
20090171511	Tolentino	Jul 2009	A1
20090199019	Hongisto et al.	Aug 2009	A1
20090205416	Campbell et al.	Aug 2009	A1
20090217277	Johnson et al.	Aug 2009	A1
20090223234	Campbell et al.	Sep 2009	A1
20090259343	Rasmussen et al.	Oct 2009	A1
20090265568	Jackson	Oct 2009	A1
20090309570	Lehmann et al.	Dec 2009	A1
20090326879	Hamann et al.	Dec 2009	A1
20090326884	Amemiya et al.	Dec 2009	A1
20100010688	Hunter	Jan 2010	A1
20100017638	Ghose	Jan 2010	A1
20100031259	Inoue	Feb 2010	A1
20100046370	Ghose et al.	Feb 2010	A1
20100100254	Artman	Apr 2010	A1
20100100877	Greene et al.	Apr 2010	A1
20100106464	Hlasny et al.	Apr 2010	A1
20100139908	Slessman	Jun 2010	A1
20100146316	Carter et al.	Jun 2010	A1
20100153956	Capps, Jr. et al.	Jun 2010	A1
20100162252	Bacher	Jun 2010	A1
20100174886	Kimelman	Jul 2010	A1
20100180089	Flemming et al.	Jul 2010	A1
20100217454	Spiers et al.	Aug 2010	A1
20100241285	Johnson et al.	Sep 2010	A1
20100241881	Barsness et al.	Sep 2010	A1
20100256959	VanGilder et al.	Oct 2010	A1
20100269116	Potkonjak	Oct 2010	A1
20100286956	VanGilder et al.	Nov 2010	A1
20100287018	Shrivastava et al.	Nov 2010	A1
20100293313	Ferringer et al.	Nov 2010	A1
20100313203	Dawson et al.	Dec 2010	A1
20100324739	Dawson et al.	Dec 2010	A1
20100324956	Lopez et al.	Dec 2010	A1
20110016339	Dasgupta et al.	Jan 2011	A1
20110035078	Jackson	Feb 2011	A1
20110038634	DeCusatis et al.	Feb 2011	A1
20110040529	Hamann et al.	Feb 2011	A1
20110055605	Jackson	Mar 2011	A1
20110072293	Mazzaferri et al.	Mar 2011	A1
20110107332	Bash	May 2011	A1
20110161696	Fletcher	Jun 2011	A1
20110173470	Tran	Jul 2011	A1
20110213508	Mandagere et al.	Sep 2011	A1
20110239010	Jain et al.	Sep 2011	A1
20110246995	Fedorova et al.	Oct 2011	A1
20110271283	Bell, Jr. et al.	Nov 2011	A1
20110283119	Szu et al.	Nov 2011	A1
20110296212	Elnozahy et al.	Dec 2011	A1
20110302582	Jacobson et al.	Dec 2011	A1
20120005683	Bower et al.	Jan 2012	A1
20120053778	Colvin et al.	Mar 2012	A1
20120071992	VanGilder et al.	Mar 2012	A1
20120072916	Hintermeister et al.	Mar 2012	A1
20120079235	Iyer et al.	Mar 2012	A1
20120079380	Tsai et al.	Mar 2012	A1
20120084790	Elshishiny et al.	Apr 2012	A1
20120109391	Marwah et al.	May 2012	A1
20120131309	Johnson et al.	May 2012	A1
20120158387	VanGilder et al.	Jun 2012	A1
20120170205	Healey et al.	Jul 2012	A1
20120180055	Brech et al.	Jul 2012	A1
20120210325	de Lind van Wijngaarden et al.	Aug 2012	A1
20120216065	Nastacio	Aug 2012	A1
20120216190	Sivak	Aug 2012	A1
20120216205	Bell, Jr. et al.	Aug 2012	A1
20120221872	Artman et al.	Aug 2012	A1
20120266174	Inoue	Oct 2012	A1
20120278810	Dawson et al.	Nov 2012	A1
20120290862	Brown et al.	Nov 2012	A1
20120323393	Imhof et al.	Dec 2012	A1
20130006426	Healey et al.	Jan 2013	A1
20130061236	Ghose	Mar 2013	A1
20130066477	Jiang	Mar 2013	A1
20130104136	Brech et al.	Apr 2013	A1
20130124003	Lehman	May 2013	A1
20130132972	Sur et al.	May 2013	A1
20130139170	Prabhakar et al.	May 2013	A1
20130166885	Ramani et al.	Jun 2013	A1
20130178991	Gheerardyn et al.	Jul 2013	A1
20130178993	Rombouts et al.	Jul 2013	A1
20130245847	Steven et al.	Sep 2013	A1
20130290955	Turner et al.	Oct 2013	A1
20130304903	Mick et al.	Nov 2013	A1
20130346139	Steven et al.	Dec 2013	A1
20130346987	Raney et al.	Dec 2013	A1
20140039965	Steven et al.	Feb 2014	A1
20140046908	Patiejunas et al.	Feb 2014	A1
20140047261	Patiejunas et al.	Feb 2014	A1
20140047266	Borthakur et al.	Feb 2014	A1
20140059556	Barsness et al.	Feb 2014	A1
20140074876	Venkataramani et al.	Mar 2014	A1
20140075448	Bell, Jr. et al.	Mar 2014	A1
20140082327	Ghose	Mar 2014	A1
20140082329	Ghose	Mar 2014	A1
20140129779	Frachtenberg et al.	May 2014	A1
20140136625	Graham et al.	May 2014	A1
20140164700	Liang	Jun 2014	A1
20140229221	Shih et al.	Aug 2014	A1
20140237090	Lassen et al.	Aug 2014	A1
20140257907	Chen et al.	Sep 2014	A1
20140278692	Marwah et al.	Sep 2014	A1
20140310427	Shaw et al.	Oct 2014	A1
20140325238	Ghose	Oct 2014	A1
20140325239	Ghose	Oct 2014	A1
20140330611	Steven et al.	Nov 2014	A1
20140330695	Steven et al.	Nov 2014	A1
20150012710	Liang et al.	Jan 2015	A1
20150019036	Murayama et al.	Jan 2015	A1
20150026695	Dawson et al.	Jan 2015	A1
20150057824	Gheerardyn et al.	Feb 2015	A1
20150066225	Chen et al.	Mar 2015	A1
20150088576	Steven et al.	Mar 2015	A1
20150112497	Steven et al.	Apr 2015	A1
20150161199	Pinko	Jun 2015	A1
20150177808	Sarti	Jun 2015	A1
20150186492	Shalita et al.	Jul 2015	A1
20150192978	Ghose	Jul 2015	A1
20150192979	Ghose	Jul 2015	A1
20150234441	Jackson	Aug 2015	A1
20150235308	Mick et al.	Aug 2015	A1
20150278968	Steven et al.	Oct 2015	A1
20150286821	Ghose	Oct 2015	A1
20150317349	Chao et al.	Nov 2015	A1
20160019093	Dawson et al.	Jan 2016	A1
20160055036	Dawson et al.	Feb 2016	A1
20160078695	McClintic et al.	Mar 2016	A1
20160087909	Chatterjee et al.	Mar 2016	A1
20160117501	Ghose	Apr 2016	A1
20160118790	Imhof et al.	Apr 2016	A1
20160119148	Ghose	Apr 2016	A1
20160179711	Oikarinen et al.	Jun 2016	A1
20160180474	Steven et al.	Jun 2016	A1
20160306410	Ghose	Oct 2016	A1
20170329384	Ghose	Nov 2017	A1
20180228060	Alissa et al.	Aug 2018	A1

Non-Patent Literature Citations (3)

Entry
Dynamic thermal management of air cooled data centers. Bash, C. B.; Patel, C. D.; Sharma, R. K.. 2006 Proceedings. 10th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronics Systems (IEEE Cat. No. 06CH37733C): 8. IEEE. (2006).
Raritan's Power Management Solutions Receive Top Emerging Technology Recognition at XChange Tech Innovator Conference. PR Newswire Nov. 19, 2008: NA.
HP Advances Flexibility, Efficiency of Blades Across the Data Center. Business Wire Nov. 12, 2007: NA.

Provisional Applications (1)

	Number	Date	Country
	61227361	Jul 2009	US

Continuations (2)

	Number	Date	Country
Parent	15490525	Apr 2017	US
Child	17473107		US
Parent	12841160	Jul 2010	US
Child	15490525		US

Energy efficient scheduling for computing systems and method therefor

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications