This generally relates to computing and information technology (“IT”) power consumption and more particularly to devices for the prediction and classification of power and/or resource utilization in computer systems.
Modern data center planning and operations require comprehensive addressing of energy management throughout the data center environment, including scenarios involving multiple data centers. In the modern IT environment, it is generally no longer adequate to only conduct performance management of IT equipment; detailed monitoring and measurement of data center performance, utilization, and energy consumption to support detailed cost control, high level IT security, and “greener” environments are now typical business requirements. Modern data centers and/or other computing systems or processes create high resource demands, and the associated costs of these resources necessitate high level capacity planning.
Conventional capacity planning power consumption prediction tools include “look up table” tools requiring the user to enter the system configuration parameters before the tool retrieves the corresponding predictive power consumption. A majority of these tools do not consider current and/or newer systems' respective operational workloads as input. Rather, these tools' typical inputs are from static or semi-static measurements from monitoring tools connected to existing systems (hardware) only. Additionally, conventional servers often host multiple applications, which in the IT environment are likely to come from different business units as modern companies find it prudent to spread applications from different business units throughout their hardware to limit the impact of a hardware failure on individual business units.
Additionally, modern data centers and/or other computing systems or processes often utilize virtualization, or “cloud computing”—internet based computing whereby shared resources, software, and other information are provided to computers and other devices on demand. Cloud computing is a byproduct and consequence of the advancing ease of access to remote computing sites provided by the internet, and has become increasingly popular because it allows high level use of the server by customers without the need for them to have expertise in, or control over the technology infrastructure in the cloud that supports their data centers and/or other computing systems or processes. Many cloud computing offerings employ the utility computing billing model, which is analogous to the consumption based billing of traditional utility services such as electricity. Workload based energy and resource utilization management is typically more significant in cloud computing environments because the actual system equipment cannot be directly managed, monitored or metered.
Modern computing has continually shifted workloads away from physical computers and onto virtual machines. Virtual machines are separated into two major categories based on their use and degree of correspondence to any real machine. A system virtual machine provides a complete system platform which supports the execution of a complete operating system (OS). In contrast, a process virtual machine is typically designed to run a single program, meaning that it supports a single process. Conventional computing offers no near-real-time nor real-time method of monitoring power consumption or power usage for such devices, which are not and/or cannot be connected to a metered power source. Additionally, a busy virtual machine can easily reach the memory limit of the physical machine it is running on, requiring the virtual machine administrator to shift the virtual machine to another target platform whose memory is less taxed in a process called “Vmotion.” Vmotion of one or more virtual machines to a target platform located in a distinct heating, ventilating, and air conditioning (“HVAC”) zone can create a “hot spot” in that HVAC zone, causing the HVAC system to expend a large amount of energy to re-establish the steady state in that zone. Overall, the current state of power consumption prediction technology contains no approach allowing for the management and optimization of the assignment of virtual machines to host platforms. Moreover, these methods do not take into consideration current or newer systems' operational workloads as input data.
Finally, the rise of modern computing has seen a corresponding rise in computer crime and other anomalous, clandestine, and unauthorized uses of system capacity. Conventional anomaly detection methods and systems distinguish anomalous use through network traffic and/or system logs. However, the classification of such attacks and other anomalous uses becomes more difficult as the sophistication of the attacker rises. For instance, sophisticated malware can launch an attack that avoids normal detection methods and only causes a system or process's power and/or resource usage to briefly increase, a blip that is conventionally indiscernible by current detection methods. Further, such malware can hide inside a system's trusted processes, e.g., OS level software tasks, which can include the on-board monitoring facilities themselves, making the detection of such anomalous events even more difficult or nearly impossible prior to system failure.
In accordance with methods and systems consistent with the present invention, a method in a data processing system is provided for predicting future power consumption in computing systems. The method comprises receiving an indication of one or more computing devices to predict power for, and receiving one or more input parameters associated with the one or more computing devices. It further comprises automatically generating a prediction of the power consumption of the one or more computing devices over a future time interval, and transmitting the generated prediction.
In one implementation, a data processing system for predicting future power consumption in computing systems is provided. The data processing system comprises a memory comprising instructions to cause a processor to receive an indication of one or more computing devices to predict power for, and receive one or more input parameters associated with the one or more computing devices. The instructions further cause the processor to automatically generate a prediction of the power consumption of the one or more computing devices over a future time interval, and transmitting the generated prediction. The data processing further comprises a processor configured to execute the instructions in the memory.
In another implementation, a method in a data processing system is provided for determining current power consumption and predicting future power consumption in computing systems. The method comprises receiving an indication of one or more computing devices to predict power for, and receiving one or more input parameters associated with the one or more computing devices. The method further comprises automatically generating one of: 1) a current status of the power consumption of the one or more computing devices, and 2) a prediction of the power consumption of the one or more computing devices over a future time interval, and transmitting the one of: (1) the current status of the power consumption and (2) the generated prediction.
Methods and systems in accordance with the present invention provide accurate power and/or resource consumption predictions and classifications in monolithic physical servers, facility equipment, individual virtual machines, groups of virtual machines running on a common physical host, and individual processes and applications running on such machines. Methods and systems consistent with the present invention apply domain agnostic data mining and machine learning predictive and classification modeling to quantitatively characterize power consumption and resource utilization characteristics of data centers and other associated computing and infrastructure systems and/or processes.
Further, workload-based energy and resource utilization management measurement, prediction, and classification enables organizations to place value on every kilowatt (“kW”) of energy used in their data centers as well as accurately charge back operational costs to their customers. Methods and systems consistent with the present invention further enable organizations to schedule the time and place applications run based on energy cost and availability. A company with geographically diverse datacenters may be able to schedule certain applications to run on datacenters located in areas where it is nighttime, potentially saving costs because energy tariffs are typically lower at night. Further, when organizations use cloud computing, the general energy costs are apportioned. Methods and systems consistent with the present invention allow greater transparency of individual workload associated energy costs, which can be used in financial modeling and metrics. Additionally, methods and systems consistent with the present invention enable users to compare the energy efficiency of their software.
Data Mining and/or Machine Learning (the terms are used interchangeably in the field) is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data. A focus of machine learning is to automatically learn to infer and recognize complex patterns within such data to make intelligent decisions based on such patterns and inferred knowledge. The difficulty lies in the fact that the set of all possible behaviors given all possible inputs is typically too complex to describe manually or in a semi-automated fashion. Domain agnosticism defines a characteristic of data mining and machine learning whereby the same principles and algorithms are applicable to many different types of computing or non-computing devices beyond servers, personal computers, or workstations; including such disparate devices as UPSs, networked storage processors, generators, battery backup systems, and other applicable pieces of equipment including HVAC controllers, in the data center as well as outside. This characteristic allows scalable infrastructure management (“IM”) for single and multiple data centers as well as cloud computing infrastructures. Specifically, predictive models, processes or algorithms that find and describe structural patterns in data that can help explain such data and make predictions from it, are programmatically created with the help of a machine learning library toolkit (Weka) that can forecast and classify power consumption and resource usage as a function of hardware (virtualized or non-virtualized) resource utilization. The models efficiently provide predictions for energy consumption, for example in kilowatts (“kW”); power cost, for example in total cost per predicted period); heat dissipation, for example in British Thermal Units per hour (“BTU/hr”); greenhouse gas effects, for example in pounds per year (“lbs/year”); and other pertinent forecasts and resource utilization classifications.
A Power Capacity Planner (“PCP”) is a component application that includes some of the features of a Data Center Infrastructure Management (“DCIM”) system. Data center infrastructure management comprises the control, monitoring tuning and other management functions of the equipment and resources needed and used in data centers. The PCP provides power consumption, heat dissipation, regional cost-per-unit of power, and regional greenhouse effects predictions based on potential, user input, time-varying server workloads, for both virtualized and non-virtualized servers. A workload is system (server) resource (CPU and memory) utilization required by operational business applications. A workload comprises CPU and memory resources needed by a software application to function as expected. A workload can vary based on how much work a business application(s), or any other suitable application, is currently performing. Workloads are typically measured within the system hosting the application(s). Workloads may be “synthetically” generated in order to effectively optimize prediction and classification capabilities. Predictive and classification models are effectively independent of software running on the target system(s). The power draw/footprint of the hardware, whether it is virtualized or non-virtualized, is a primary factor used to generate the predictive and classification models. Any number of servers with equal or similar power consumption footprints may be grouped and analyzed together providing the capability to consolidate or expand server quantities as needed. This also facilitates the “relocation” or “movement” of servers (typically virtualized) to other less taxed HVAC cooling zones within a data center, for example. The PCP also allows efficient, customized creation of models in real-time for those virtual or non-virtualized platforms that have not been categorized previously.
The PCP application may use of machine learning technology that enables the prediction and classification modeling of dependent variables (outputs), such as power consumed, based on data sources containing independent variables (inputs), such as resource (CPU and Memory) utilization, which may be measured as percentages.
The PCP may be web-enabled and may comprise a client front end (or web-service), in which the user inputs relevant parameters with some up-front processing taking place, and a server back end, in which most of the processing as well as the execution of the machine learning models occurs. In one implementation, the bridge between the client front end and the server back end is Java Server Pages (“JSP”), which facilitate the use of the HTTP protocol over the internet for fast and efficient distributed data sharing. The client front end may be, for example, implemented using Adobe Flex/Flash Multi-Media Optimized XML (“MXML”) and ActionScript for high quality graphics. The server back end may be implemented using Java and/or Oracle Fusion middleware to optimize portability. However, any other suitable implementation may be used.
According to one embodiment, processor 104 executes one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another computer-readable medium, such as storage device 110. Execution of the sequences of instructions in main memory 106 causes processor 104 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory 106. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, embodiments are not limited to any specific combination of hardware circuitry and software.
Although described relative to main memory 106 and storage device 110, instructions and other aspects of methods and systems consistent with the present invention may reside on another computer-readable medium, such as a floppy disk, flexible disk, hard disk, magnetic tape, CD-ROM, magnetic, optical or physical medium, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read, either now known or later discovered.
Computer 100 also includes a communication interface 118 coupled to bus 102. Communication interface 118 provides a two-way data communication coupling to a network link 120 that is connected to one or more network 122, such as the Internet or other computer network. Wireless links may also be implemented. Communication interface 118 may send and receive signals that carry digital data streams representing various types of information.
In one implementation, computer 100 may operate as a web server (or service) on a computer network 122 such as the Internet. Computer 100 may also represent other computers on the Internet, such as users' computers having web browsers, and the user's computers may have similar components as computer 100.
A Server Planner component of the PCP enables the prediction of power consumption, heat dissipation, regional power costs, and regional greenhouse gas effects based on potential, user defined, time-varying, sector workloads. It may use prediction models. Time-varying workload profiles allow effective and realistic prediction of power consumption and cooling requirements that fluctuate over time. These power consumption predictions may be used to plan computer usage in data centers, for example.
The Server Planner allows a user to estimate power consumption for any number of homogenous or heterogeneous servers that have similar power draw requirements. In one implementation, it may work for servers with dissimilar energy consumption requirements. Generally, heterogeneous servers can be grouped together if they have similar power consumption levels during significant workloads and at idle times. The Server Planner also defines work profiles, discussed further in relation to
The Server Planner also displays the outcome of different potential scenarios and may allow the “stacking” of plotted/graphed scenarios, for example, a certain number and type of server having a certain power draw at the particular time intervals, on the same charts. It is possible to compare the power, heat, cost, and greenhouse effects produced by different potential scenarios, defined by work profiles for example, graphically and statistically within the same individual charts. A scenario may include, for example, comparing 10 racks of 50 Dell PE2900 servers with an average power draw of about 270 kW versus 2 racks of 80 Dell PE2900 servers with an average power draw of about 90 kW.
Workload % 208 is the workload defined for the server chosen. In one implementation, workload may be defined as the percentage of CPU and memory being utilized by the server's business application(s). +/−222 represents a user defined acceptable level of variance in the workload percentage entered. Workload Type 224 defines the distribution of the chosen workload between CPU utilization and memory utilization. For example, if a user enters a Workload % of 30% and selects a “balanced” workload type, which is defined as nearly equal CPU and Memory utilization, the systems creates a model based on similar CPU utilization and memory utilization, in this case about 15% for each. Other potential workload types include, but are not limited to, “CPU intensive” or “Memory intensive”. In Start 226 the user enters the starting date of the analysis. In End 228, the user enters the ending date of the analysis. In Time Interval Period Dropdown Menu 230, the user may enter the unit of time of the modeled time interval. For example, the menu options may include hours, days, weeks, months, years, or any other unit of time.
Once the user enters the parameters, the user may click PROCESS 232 to initiate the prediction process based on input parameters. In one implementation, the PCP opens the power prediction chart automatically upon conclusion of model processing.
Clicking Configuration 234 opens Page View 200, the initial input parameter definition screen of the Server Planner implementation, which allows the user to enter and select the values needed to generate power predictions. Clicking Work Profiles 236 allows the definition of specific work profiles that require different workloads within a given time interval or sub-interval, as discussed below in relation to
A process, described in further detail below in relation to
The user may also perform the “Work Profiles” function of the PCP within the Server Planner. Work Profiles allows the definition of different and time changing workloads from those defined for an entire time interval. It allows the definition of specific use cases, for example a case when special workloads for each weekend of a given month are needed. The Work Profiles feature may be activated once a potential scenario has been processed. In one implementation, after such processing, the PCP automatically navigates to the Power screen, the screen that shows the power consumption over time. At this point and after analysis of the charts, the user can activate Work Profiles in order to define any special workload requirements within the given scenario's time interval, such as the ability to define workloads over specific period(s) of time within the scenario's full time interval defined previously.
Clicking Load Profile 416 loads the profiles defined by the user. In one implementation, a user may re-use a profile if it fits properly with the new time interval defined, for example, the dates could be out of range, e.g., the work profile was defined for January 2010, but the current scenario time interval is for the 3rd quarter of 2010. Clicking Save Profiles 418 saves the currently defined profiles into the work profile definition XML file. Clicking PROCESS 420 applies the current displayed profiles to the previously defined and submitted scenario. In one implementation, the system updates the charts with the requirements of the work profiles applied by clicking PROCESS 420. Clicking Delete Profile 422 removes the selected profile from the system. Clicking Clear Profiles 424 closes all profiles currently displayed by the system.
Clicking Configuration 426 opens Page View 400, the initial parameter definition screen of the Work Profiles function of the Server Planner implementation. Clicking Work Profiles 428 allows the definition of further specific work profiles that require different workloads within a given time interval, as currently discussed and further discussed below in relation to
The VMachine Planner feature of the PCP enables the prediction of power consumption, heat dissipation, regional power costs, and regional greenhouse gas effects for virtualized or non-virtualized servers. In one implementation, it enables prediction regarding virtualized systems that can have heterogeneous and/or homogeneous characteristics including power draw footprints. Any number of these servers can be analyzed at the same time, each with specific potential, user defined workloads and for specific time periods. It is possible to obtain the total power budget of the physical underlying platform from the virtual machines defined within the VMachine Planner.
The VMachine Planner allows the prediction of power consumption of virtualized or non-virtualized servers that can have heterogeneous and/or homogenous characteristics. This feature facilitates power and cooling budget planning where servers need to be moved to other physical locations within a data center or to remote locations. The VMachine Planner has similar charting capabilities as the Server Planner. The VMachine Planner also allows the stacking of plotted or graphed scenarios on the same charts. The system graphically and statistically compares the power, heat, cost, and greenhouse effects produced from different potential scenarios within the same individual charts.
Model Name 618 displays the models defined by the user in the entry cell selected. The user may highlight, for example by clicking, which model the user wishes the system to use for that session. In one implementation, only model names suffixed with REP may be used for predictions. In that implementation, all other models must first be created using the Model Generation implementation of the PCP, described below in relation to
Clicking Load VMs 628 loads the servers last configured in the VMachine Planner. Clicking Add VM 630 allows the user to add an additional server to the current data grid. Clicking Save VMs 632 stores the current data grid into an XML file. Clicking PROCESS 634 initiates the prediction process for all the servers in the current data grid. Clicking Delete VM 636 deletes highlighted or selected servers within the data grid. Clicking Clear VMs 638 clears all servers and associated parameters within the current data grid.
Clicking Configuration 640 opens Page View 600, the initial parameter definition screen of the VMachine Planner implementation. Clicking Power 642 displays the chart containing power usage estimates for the entered parameters. In one implementation, power is measured in kilowatts. Clicking Heat 644 displays the chart containing dissipated heat estimates for the entered parameters. In one implementation, heat dissipated is measured in BTUs. Clicking Cost 646 displays the chart containing cost estimates for the entered parameters. In one implementation, this is measured in U.S. dollars. Clicking CO2 648 displays the chart containing the regional CO2, SOx, and NOx output emission rate estimates for the entered parameters. In one implementation, these are measured in pounds/year. In another implementation the regions defined may be U.S. states. In one implementation the user may zoom in to a specific data point in any of the aforementioned charts, opened by clicking Power 642, Heat 644, Cost 646, or CO2 648, by clicking on the desired point within the given chart. Clicking Clear Charts 650 closes all charts currently displayed and displays the configuration screen, Page View 600. Clicking Close 652 closes the VMachine Planner window.
The Model Creation feature of the PCP allows a user to create a model suited for the user's own legacy or new platforms, whether virtualized or non-virtualized. This feature provides return on investment by extending the life and utility of the application.
The Model Creation feature allows the definition and creation of customized predictive models based, in one implementation, on two user input parameters: the idle power level of the fully configured system without running any workloads and the maximum workload power level for a specific server platform.
Clicking Load Models 814 loads the models previously defined on that system. In one implementation, models already generated are suffixed with the characters “REP.” Clicking Save Models 816 saves the models shown in the current Model Creation screen to the XML model storage file. Clicking Add Model 818 defines a basic empty entry onto the screen, which is done for use input convenience. Clicking PROCESS 820 generates the selected model. In one implementation, the model name will be suffixed with “REP” after successful creation. Clicking Delete Model 822 deletes models highlighted in the model creation screen. Clicking Clear Models 824 clears the screen completely. Clicking Close 826 closes the Model Creation screen.
The following is an example of the comma separated values (“.CSV”) format for the Data File 812. The first row must contain a header describing the column for the CPU utilization, the Memory utilization (for example, as percentages), and the power (for example, in watts) measured:
The Synthetic Meter enables the prediction of power consumption, heat dissipation, regional power costs and regional greenhouse gas effects for operational, metered or non-metered servers on-line in near real time. Resource utilization, such as CPU and memory usage, metrics are obtained from the operational system and input into the selected prediction models continuously, for example every second. The Synthetic Meter may use Windows WMI and Linux WMI/WBEM or the Top utility, for example, to obtain server resource utilization metrics. The Synthetic Meter may accept metrics from any data collection service over the network. Additionally, the Synthetic Meter also compares virtualized and/or non-virtualized servers by virtue of their corresponding models, enabling monitoring and comparison of power consumption and cooling requirements online within the same display chart to any servers connected to the network. The same monitoring and comparison capabilities may be available for each selected business application or task running on a particular machine, virtualized or non-virtualized. The system also can compare the power consumption predictions obtained for a business application or task between a number of machines, by virtue of their corresponding models used for each machine. This feature enhances many IT functions, for example server consolidation/relocation studies and hardware refresh projects, which involve the replacement of outdated legacy equipment with newer, more capable and efficient hardware. Power capping features at the server level as well as for specific applications or tasks may also be provided. Power capping is used to limit the amount of power consumed and/or the CPU and memory, or resource, utilization by an operational system and/or business application running on a virtualized or non-virtualized machine.
The Synthetic Meter component of the PCP allows power, heat, cost, and CO2 emission prediction based on recently, for example near real-time, obtained CPU and memory utilization values as percentages of the total possible CPU and memory usage, for example, 50% of the total possible CPU usage from operational systems. These are the independent variables to be input into the predictive models. A user may enter a business application or task name that is running on the entered host/machine to have the metering and predictions conducted for that application or task only. The same server and/or application may be entered multiple times with different models. This allows the user to dynamically compare the power, cooling, and emission rates across different platforms for the same host and/or applications by virtue of the different selected models. Finally, the Synthetic Meter may be used to cap the power available to a host or a specific application or task, allowing users to optimize performance while limiting resource utilization and/or cost.
Model Name 1012 displays the models defined by the user in the entry cell selected. The user may highlight, for example by clicking, which model the user wishes the system to use for that session. In one implementation, only model names suffixed with “REP” may be used for predictions. In that implementation, all other models must first be created using the Model Generation implementation of the PCP, described below in relation to
Clicking Load Hosts 1018 loads into the data grid (the window where the user enters data) previously defined models, hosts/machine names, and corresponding business application names or tasks. Clicking Add Host 1020 inserts a new entry into the data grid. Clicking Save Hosts 1022 saves the contents of the current data grid, for example into an XML file, for later retrieval and/or use. Clicking PROCESS 1024 starts the metering and prediction for the hosts and/or tasks defined in the displayed data grid. Clicking STOP 1026 stops currently running metering and prediction. Clicking Delete Host 1028 deletes selected rows from the displayed data grid. Clicking Clear Hosts 1030 clears all entries from the displayed data grid.
Clicking Configuration 1032 opens Page View 1000, the initial parameter definition screen of the Synthetic Meter implementation. Clicking Power 1034 displays the chart containing power usage estimates for the entered parameters. In one implementation, power is measured in kW. Clicking Heat 1036 displays the chart containing dissipated heat estimates for the entered parameters. In one implementation, heat dissipated is measured in BTU. Clicking Cost 1038 displays the chart containing cost estimates for the entered parameters. In one implementation, this is measured in U.S. dollars. Clicking CO2 1040 displays the chart containing the regional CO2, SOx, and NOx output emission rate estimates for the entered parameters. In one implementation, these are measured in lbs/year. In another implementation the regions defined may be U.S. states. In one implementation the user may zoom in to a specific data point in any of the aforementioned charts, opened by clicking Power 1034, Heat 1036, Cost 1038, or CO2 1040, by clicking on the desired point within the given chart. Clicking Clear Charts 1042 closes all charts currently displayed and displays the configuration screen, Page View 1000. Clicking Close 1044 closes the Synthetic Meter window.
After the resource utilization metrics are collected and sent back to the server back end via the internet, the resource utilization metrics for each machine, as well as each individual application, are input into each respective predictive model and power capping is performed if necessary (step 1108). Power capping limits the amount of power consumed and/or resources (CPU and Memory) utilized by a host/machine and/or the business applications running on such machine. In one implementation, the limitation is enforced via software only; for example by tuning the application/task execution priority and core affinity, core affinity is the number of CPU cores available for use by such application/task when executing; and does not use hardware. In one implementation, if the user has enabled the power capping feature and the model determines that the resource utilization is higher than the defined usage limit, power capping would take place. Once the resource utilization metrics are input into the respective models, the predicted power consumption values are sent in batch transmissions to the client front end at defined intervals, for example every 15 seconds (step 1110). Thus, the user may view the predicted values for each of the time-series modeled (step 1112). In one implementation, the synthetic meter “stacks” multiple time-series for each corresponding entered model on the same chart for comparison purposes. In one implementation, the predicted values view includes the mean, high and low predictions for each value obtained from the prediction batch update sent in step 1110. In one implementation the values viewed in step 1112 may be represented in graphical form. In one implementation, zooming capabilities and smart data tips for each time series point are instantly available upon cursor positioning. In another implementation, each machine and/or individual application modeled may be plotted on a single graph or chart. The meter may continuously update the chart(s) as new batch transmissions arrive from the server back end at each defined interval. In one implementation, these updates overwrite the oldest interval on the chart, shifting the entire time series chronologically to display the most recent prediction batch(es).
The Power Estimator enables the prediction of power consumption, heat dissipation, regional power costs, and regional greenhouse gas effects based on operational server resource metrics previously collected from the entered machine/host name(s) and stored in XML files, for example. This enables the user to obtain accurate knowledge of a server's operational power consumption past trends, which may be compared to “what-if” time varying workloads provided by the Server Planner or VMachine Planner, workloads defined by the Server/VMachine Planner and any Workload Profiles defined within a given scenario's time interval, for example.
The Power Estimator feature of PCP allows the power, heat, cost, and CO2 emission predictions for previously measured independent variables, for example CPU utilization and memory utilization, as well as dependent variables, for example power utilization. This data is known as “supervised test data” in the art of machine learning, and power consumption, the dependent variable, does not have to be measured. The Power Estimator will request predictions from, in one implementation, every model defined for a particular server type entered, and statistically infer the best predictions from the models consulted. On the other hand, if a user enters the name of its own custom-generated model, then the Power Estimator obtains the power consumption estimates from that model. In cases where power was also measured via a meter attached to the host/machine under study, the power consumption predictions may be graphically and statistically compared to the actual power measurements obtained within the same chart(s).
Data Files Processed Menu 1218 displays data files already processed. Once the input parameters have been entered, clicking PROCESS 1220 selects the input file and invokes the selected models.
Clicking Configuration 1222 opens Page View 1200, the initial parameter definition screen of the Power Estimator implementation. Clicking Power 1224 displays the chart containing power usage estimates for the entered parameters. In one implementation, power is measured in kW. Clicking Heat 1226 displays the chart containing dissipated heat estimates for the entered parameters. In one implementation, heat dissipated is measured in BTU. Clicking Cost 1228 displays the chart containing cost estimates for the entered parameters. In one implementation, this is measured in U.S. dollars. Clicking CO2 1230 displays the chart containing the regional CO2, SOx, and NOx output emission rate estimates for the entered parameters. In one implementation, these are measured in lbs/year. In another implementation the regions defined may be U.S. states. In one implementation the user may zoom in to a specific data point in any of the aforementioned charts, opened by clicking Power 1224, Heat 1226, Cost 1228, or CO2 1230, by clicking on the desired point within the given chart. Clicking Clear Charts 1232 closes all charts currently displayed and displays the configuration screen, Page View 1200. Clicking Close 1234 closes the Power Estimator window.
The Anomaly Detector component of the PCP uses resource utilization pattern recognition to effect monitoring and classification of any potential anomalous resource utilization by any machine, virtualized or non-virtualized, and/or the business applications running on such machine. The Anomaly Detector detects potential intrusions in the system by detecting anomalous power and resource utilization fluctuations. The pattern recognition models can also detect anomalous resource utilization on any process or thread started on the machine, including OS processes and threads. For example, the Anomaly Detector may be used to detect malware infected OS processes and/or tasks. In order to lessen the frequency and probability of “false-positives,” or false alarms, a workload threshold can be defined to indicate the maximum expected workload of a machine and/or application(s). A manufacturer or user may also set a default value to be applied when such threshold has not been defined. User tunable “delta,” or difference, factors, each factor representing an allowable variability in the difference between the threshold and measured values, may be used to decide when thresholds have been truly exceeded.
In one implementation, there are three layers of checks, or filters, to classify detected anomalies: (1) the workload threshold, (2) statistical derivatives calculated from additional input/output (“I/O”) activity metrics including I/O activity at the system, e.g., cache, activity, system wide and individual applications' processor, file system, and memory activity metrics, including corresponding threads' activity metrics, and application levels from the entire machine, from the network interface connections (“NICs”), e.g., network adapters' activity metrics including errors and retries, and from the storage subsystem(s), e.g., logical and physical disks' activity metrics including corresponding NICs' activity belonging to SANs and iSCSI storage controllers, and finally, (3) a check against a rule-based time sensitive, or aged, direct access repository of false-positive event exceptions, including each triplet composed of the classification model, the host/machine name or IP address, and the respective application. This repository may comprise a hash map class, providing deterministic average times for reads and writes, residing in memory and periodically stored to disk. In one implementation, this repository is dynamically updated when a user labels a positive event as a false-positive. To curtail the growth of the repository, each entry may be time-stamped when added to allow eventual removal after a user defined “expiration” date/period. In one implementation, when repository rules reach their life time period, the user is asked if such rules can be removed. If the user answers in the negative, the PCP may set extended life time periods on those rules.
It may be possible to monitor the same machine and/or business application(s) multiple times using different classification models by simply entering the same host/machine name multiple times with each entry having different classification models. This allows the user to dynamically assemble a majority voting of “anomaly detection experts” (by virtue of the different selected models) that can help identify false-positive events. An unusually high rate of false-positives for a sustained time may indicate that a particular machine configuration has changed significantly in hardware and/or software. When this happens, the classification model for that machine may be regenerated to account for the changes in the machine configuration so that the Anomaly Detector may not continue to generate a higher rate of false-positive events.
Resource utilization metrics may also be mined to identify operational reliability of hardware and associated applications. The mined data may include the latest resource utilization, I/O activity, and statistical derivatives, which may include for example the mean, mode, high, low, and/or standard deviation for each significant metric collected regularly, such as disk, network, interprocess communication, thread management, etc. and related metrics obtained from the O/S. Anomalous events contain traces from the source machine to help understand the root cause of the anomaly. These traces comprise the statistical information including the derivatives mentioned previously as well as the machine name, the classification model used, and the application name. An anomaly thus can also indicate that a machine is failing or near failure, and/or that an application is malfunctioning.
The Anomaly Detector also enables users to identify and/or classify the type of workload, for example transactional, computational, CPU only or memory only workloads, handled by individual applications. This ability has value in controlling resource costs through resource management and/or reallocation of assets. For example, memory intensive applications may be shifted to slower CPUs systems which cost less to operate than fast CPUs that require high energy usage. Additionally, workload types may be aggregated to obtain a hierarchy of most frequently handled workloads at the machine level. This allows optimization of machine configuration as well as predictions of current and future performance and reliability.
Clicking Load Hosts 1414 loads the previously defined models, including the rest of the fields of the data grid, into the screen/window data grid. Clicking Add Host 1416 inserts a new entry into the data grid. Clicking Save Hosts 1418 saves the contents of the current data grid, for example into an XML file, for later retrieval and/or use. Clicking PROCESS 1420 starts the anomaly detection for the hosts and/or tasks defined in the displayed data grid. Clicking STOP 1422 stops the currently running anomaly detector. Clicking Delete Host 1424 deletes any selected rows from the displayed data grid. Clicking Clear Hosts 1426 clears all entries from the displayed data grid.
Clicking Anomalies 1430 displays any anomalies or alarms detected by the system. In one implementation, this may be limited to anomalies or alarms detected within a defined time period, for example the last 10 minutes. Clicking Clear 1432 closes the chart currently displayed and displays the configuration screen, Page View 1400. Clicking Close 1434 closes the Anomaly Detector window.
This system may generate supervised training data for every CPU load from 5% to 90%. In some implementations, this may be performed in 5% increments, for example at 5% CPU load, 10% CPU load, 15% CPU load, etc. First, the system calculates the deltapower and base power based on the idle power level and maximum power level of the system (step 1800). For example, in some implementations, if deltapower is less than 100.0, basepower=deltapower*0.55. Otherwise, basepower=deltapower*0.85.
Next, the system determines the CPU variability for every CPU value from 5% to 90% workload in 5% step increments with a variability of +/−5% (step 1802). An example of code for this step is as follows—
Then, the system determines the range of power estimation based on the delta difference between idle and maximum powers based on the CPU load specified in step 1800 and the deltapower calculated in step 1802 (step 1804). An example of code for this step is as follows—
The system next performs a series of steps to approximate each point in a probability distribution with a set number of total points (step 1806). In some implementations, the probability distribution may have 200 total points. First, the system determines the adjustment factor for the CPU load specified in step 1800 based on the location of the given probability distribution point (step 1808). Next, the system calculates the CPU utilization and respective power draw (step 1810). Taking into account the fact that when CPU usage peaks, memory usage generally drops and power draw generally peaks (step 1812), the system then calculates memory usage and adjusts the calculated CPU utilization and power draw from step 1810 (step 1814). An example of code for these steps is as follows—
Then, the system stores the calculated resource utilization (both CPU and memory) and the respective power draw into the training data file (step 1816). Finally, the process is repeated from step 1806-1816 for each point in the probability distribution, and the process is repeated from step 1802-1816 for each incremental CPU load to be calculated in accordance with step 1800 (step 1818).
The foregoing description of various embodiments provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice in accordance with the present invention. It is to be understood that the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
This application claims benefit to U.S. Provisional Patent Application Ser. No. 61/378,928 filed Aug. 31, 2010, entitled “Method and System for Power Capacity Planning” which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61378928 | Aug 2010 | US |