Many organizations utilize data centers to provide centralized computational and/or storage services. Data centers are relatively expensive to operate as a result of costs to power and cool data center resources. Data centers resources include, for example, servers, storage components, network switches, processors, and/or controllers. As demand for data center resources increases, the energy to operate data center resources can increase.
Data centers are commonly used by organizations (e.g., governments, corporations, companies, network providers, service providers, cloud computing operators, etc.) for computing and storage resources (e.g. data center resources). Data centers resources include, for example, servers, processors, network switches, memories, and/or controllers. Data center resources consume relatively large amounts of power to operate. For example, the Environmental Protection Agency projects that energy consumption by data centers in the United States could exceed 100 billion kilowatt hours (kWh) in 2011, which could cost data center operators an estimated $7.4 billion. Rising energy costs, regulatory power requirements, and social concerns regarding green house gas emissions have made reducing power consumption a priority for data center operators.
Many data center operators use various methods to manage data center workload capacity and power through static or dynamic provisioning of data center resources. A workload includes requests from users to access and/or utilize resources in a data center. A workload can include and/or utilize, for example, data center bandwidth, processing power, a number of servers, server space, and/or memory. As the dynamics of computing change from centralized workstations to distributive computing via data centers, data centers are processing increasingly larger workloads.
Some data centers implement a static configuration that assumes there is a stationary demand pattern. These data centers provision resources as some percentage of peak demand and do not change the provisioning. However, time-varying demand patterns of resource usage can result in data centers being over-provisioned (e.g., allocating too many resources) during some time periods and being under-provisioned (e.g., allocating too few resources) during other time periods. During instances of over-provisioning, the data centers generally consume excess power. Additionally, during instances of under-provisioning, data center resources may violate service level agreements (SLAs) resulting in lost business revenue.
Other data centers may use dynamic and/or real time solutions that use reactive routines and/or algorithms to monitor workload demands and turn data center resources off/on in response to an actual workload. While these dynamic and/or real time solutions can potentially save power consumption by correlating available resources to actual workloads, the relatively frequent provisioning of the resources can make the data centers unstable. For example, frequently provisioning resources may cause a loss of service. Additionally, frequent provisioning of resources can increase data center overhead (e.g., time to provision resources, operator time to manage the provisioning, and/or system diagnostics to maintain data center stability) and/or can result in frequent power cycling of data center resources, thereby causing wear on the resources. The wear may result in service outages and/or increased costs associated with frequently repairing and/or replacing the data center resources. Further, these dynamic and/or real time solutions are sometimes not preferred by data center operators because the operators want to verify and/or approve new provisioning configurations of data center resources.
Example methods, apparatus, and/or articles of manufacture disclosed herein provide for greater efficiency of data centers by identifying a representative workload pattern for a data center, provisioning a first portion of the data center resources for a base workload for time intervals based on the representative workload pattern, and configuring a second portion of the data center resources to be on standby to process excess workloads. A representative workload pattern is a mean, average, and/or other baseline pattern of measured and/or actual historical data center resource usage.
The example methods, apparatus, and articles of manufacture disclosed herein determine a base workload from a representative workload pattern by determining a number of resources to provision that can process a majority (e.g., 60%, 75%, 90%, etc.) of the previously measured workload. A base workload specifies a number and/or an amount of resources to be provisioned during a time interval. An workload includes, for example, a number of users, request rates, a number of transactions per second, etc. Some such example methods, apparatus, and/or articles of manufacture disclosed herein create base workloads that are substantially constant during respective time intervals so that additional data center resources are not provisioned during time intervals corresponding to the baseline usage patterns. In other words, example methods, apparatus, and/or articles of manufacture disclosed herein provision data center resources at the start of each time interval based on a base workload specified for that time interval and additional resources may not be brought online during that interval.
Example methods, apparatus, and/or articles of manufacture disclosed herein utilize patterns of data centers and/or patterns in workloads to determine representative workload patterns for the data centers. Example methods, apparatus and/or articles of manufacture disclosed herein resolve a representative workload pattern with a routine that reduces a number of time intervals throughout a period of interest, thereby reducing a number of times data center resources are provisioned or shut down during a period (e.g., an hour, a day, a week, etc.). The example routine incorporates, for example, costs and/or risks associated with provisioning data center resources and/or under-provisioning such resources.
Example methods, apparatus, and/or articles of manufacture disclosed herein reduce data center energy consumption by having a first portion of data center resources provisioned to manage a base workload at relatively coarse time intervals (e.g., hours) and having a second portion of data center resources that are configured to be reactive to manage excess workloads that exceed the base workload and/or a threshold based on the base workload. In this manner, the first portion of the data center resources are configured to manage long term patterns of workload while the second portion of the data center resources are configured to manage relatively brief spikes in workload from, for example, flash crowds, service outages, holidays, etc.
The provisioning of data center resources by example methods, apparatus, and/or articles of manufacture disclosed herein allocates a number of data center resources needed to process a workload, thereby meeting SLA requirements while reducing the number of times data center resources are activated and/or deactivated. Reducing the number of times data center resources are activated/deactivated reduces wear, energy, repair time, and/or resource replacement costs. By reducing an amount of provisioning of data center resources, example methods, apparatus and/or articles of manufacture disclosed herein ensure that data centers remain stable and within configurations approved by data center operators. Additionally, example methods, apparatus, and/or articles of manufacture disclosed herein are able to adapt to any workload demand pattern and/or data center environment. Further, example methods, apparatus, and/or articles of manufacture disclosed herein may be customized by data center operators to accommodate different structures of data centers and/or beneficial balances between power consumption and SLA violations.
While example methods, apparatus, and/or articles of manufacture disclosed herein are described in conjunction with data center resources including, for example servers, example methods, apparatus and/or articles of manufacture disclosed herein may provision any type of computing resource including, for example, virtual machines, processors, network switches, controllers, memories, databases, computers, etc. Further, while example methods, apparatus and/or articles of manufacture disclosed herein are described in conjunction with an organization, example methods, apparatus and/or articles of manufacture disclosed herein may be implemented for any type of owner, leaser, customer, and/or entity such as network providers, service providers, cloud computing operators, etc.
The example organization 102 illustrated in
The example data center 108 of
In the illustrated example, the first portion of resources 110 includes resources A1-AN and the second portion of resources 112 includes resources B1-BN. The resources A1-AN and B1-BN include any amount and/or type(s) of server, blade server, processor, computer, memory, database, controller, and/or network switch that can be provisioned. Based on representative workload patterns and/or calculated base workloads, resources may be exchanged between the two portions 110 and 112. Thus, the example portions 110 and 112 represent logical groups of the resources A1-AN and B1-BN based on provisioning. For example, the resource A1 may be physically adjacent to the resource B1 (e.g., adjacent blade servers within a server rack) while the resource A2 is located 3,000 miles from the resources A1 and B1. In other examples, the portions 110 and 112 may also represent physical groupings of the resources A1-AN and B1-BN. There may be the same or different numbers and/or types of resources within the first portion of resources 110 and the second portion of resources 112.
The example server 106 and the example data center 108 of
To provision the example portions of resources 110 and 112 within the example data center 108, the example server 106 of
To collect workload data patterns, the example resource manager 116 of
The workload monitor 118 of the illustrated example collects workload data patterns over time periods. For example, the workload monitor 118 may monitor workload data patterns during 24 hour time periods. In other examples, the workload monitor 118 may monitor data patterns during weekly time periods, bi-weekly time periods, monthly time periods, etc. By collecting data patterns over time periods, a base workload forecaster 120 (e.g., a processor) can characterize the workload data patterns for a same time of day. For example, workload data patterns collected over a plurality of 24 hour time periods by the workload monitor 118 can be compared by the base workload forecaster 120 to determine a 24 hour representative workload pattern. Alternatively, the workload monitor 118 may probe workloads of the workstations 104 during different time periods. In these alternative examples, the example base workload forecaster 120 resolves and/or normalizes the different time periods to determine a representative workload pattern.
After collecting workload data patterns, the example workload monitor 118 of
To determine a representative workload pattern, the example resource manager 116 of
To determine a representative workload pattern from among a plurality of workload data patterns, the example base workload forecaster 120 performs, for example, a periodicity analysis. An example periodicity analysis includes a time-series analysis of workload data patterns to identify repeating workload patterns within a time period. The example base workload forecaster 120 performs the periodicity analysis using a Fast Fourier Transform (FFT). The base workload forecaster 120 then determines a representative demand using an average and/or a weighted average of historical workload data. The weights for the weighted average may be selected using linear regression to reduce a sum-squared error of projection of the representative workload pattern. An example of the base workload forecaster 120 using workload data patterns to determine a representative workload pattern is described in conjunction with
After determining a representative workload pattern, the example base workload forecaster 120 determines a number of time intervals. The example base workload forecaster 120 determines a number of time intervals to determine when the first and second portions of resources 110 and 112 are to be provisioned based on the representative workload pattern. In other examples, the base workload forecaster 120 determines a number of time intervals and determines a corresponding base workload for each time interval based on corresponding collected workload data patterns. The example base workload forecaster 120 of such other examples then combines the base workloads to create a representative workload pattern for a time period.
The example base workload forecaster 120 of
In an example of determining a number of provisioning intervals, the example base workload forecaster 120 uses a dynamic programming solution (e.g., a routine, function, algorithm, etc.). An example equation (1) is shown below to illustrate an example method of summing variances of projected demand (e.g., mean (demand)−demand)) over a number of provisioning intervals (n). The example base workload forecaster 120 uses a dynamic programming solution to reduce a number of provisioning time intervals while reducing the sum of variances.
Σi=1n(mean(demand([t1−1),ti]))−demand([ti−1, ti))2+(n−1) (1)
Solving the example equation (1) using a dynamic programming solution simultaneously reduces a number of provisioning changes to the data center 108 and a workload representation error as a difference between a base workload during a time interval and a representative workload pattern. In the example equation (1), the demand variable is a projected demand (e.g., the representative workload pattern) for each time interval. The number of time intervals may be expressed by example equation (2):
{[t0, t1]}, {[t1, t2]}, . . . , {[tn−1, tn]} (2)
In the example equation (2), the time period is defined as to=0 and tn=86400 (i.e., the number of seconds in 24 hours). In other examples, the time period may be relative longer (e.g., a week) or relatively shorter (e.g., an hour). In this example, the example equation (1) is configured based on an assumption that the base workload is relatively constant. In other words, the example equation (1) is utilized based on an assumption that data center resources are not re-provisioned (i.e., added or removed) during a time interval.
After determining a number of time intervals to be provisioned, the example base workload forecaster 120 determines a base workload for each such time interval using the representative workload pattern. In some instances, example base workload forecaster 120 may determine a base workload for each time interval by determining an average of the representative workload pattern during the time interval. In other examples, the base workload may be determined by finding a 90% of peak (or other specific percentile of peak) workload demand of the representative workload pattern during the time interval. In yet other examples, data center operators may specify criteria for determining a base workload for a time interval based on the representative workload pattern.
Further, the example base workload forecaster 120 may determine thresholds in relation to the base workload for each time interval. The thresholds may indicate to the resource manager 116 when the second portion of resources 112 is to be utilized. In other words, the thresholds may instruct the resource manager 116 to prepare at least some of the resources B1-BN for processing workload before the capacity of the first portion of resources 110 is reached. In other examples, the thresholds may be the base workloads.
To provision the example first portion of resources 110, the example resource manager 116 of
After determining the appropriate time interval, the example predictive controller 124 identifies the base workload for the time interval and determines an amount of resources of various type(s) that corresponds to the base workload. In some instances, the example predictive controller 124 may determine an amount of resources based on one or more type(s) of application(s) hosted by the resource(s) because different applications may have different response time targets (e.g., SLAs). For example, different amounts of resources may be provisioned by the predictive controller 124 for web searching, online shopping, social networking, and/or business transactions.
In some examples, the predictive controller 124 provisions resources within the data center 108 using, for example, queuing modeling theory. For example, an application hosted by the resources A1-AN and B1-BN of the illustrated example may have an SLA response time of t seconds, a mean incoming arrival rate of l jobs per second (e.g., workload), and a mean resource processing speed of u jobs per second. Based on this information, the predictive controller 124 may then solve example equation (3) to determine an amount of resources to provision.
In other examples, the predictive controller 124 provisions resources based on other equation(s), algorithm(s), routine(s), function(s), or arrangement(s) specified by data center operators. In examples where resources are to concurrently host different type(s) of application(s), the example predictive controller 124 may apply different SLA requirements for each application type, and provision specific resource(s) for each application type. In other examples, the predictive controller 124 may aggregate and/or average SLA requirements for different applications type(s) and provision the first portion of resources 110 to host the different applications.
The example predictive controller 124 of
The example predictive controller 124 of
To provision the second portion of resources 112 for processing actual workloads that exceed base workloads for respective time intervals, the example resource manager 116 of
The example alert instructs the reactive controller 126 to provision the second portion of the resources 112. The example alert may also inform the example reactive controller 126 about an amount of excess workload that exceeds a base workload for a time interval. In response to receiving an alert, the example reactive controller 126 determines an amount of resources to provision and instructs the data center 108 to provision those resources (e.g., from the second portion of resources 112). In other examples, the reactive controller 126 immediately provisions additional resources to proactively process relatively fast increases in actual workload. In this manner, the reactive controller 126 provisions resources from among the second portion of resources 112 that are used to process excess actual workload, thereby reducing energy costs while meeting SLA requirements and maintaining customer satisfaction.
When the actual workload recedes below a base workload during a time interval, the example reactive controller 126 of
The reactive controller 126 of the illustrated example provisions the second portion of resources 112 in substantially the same manner as the predictive controller 124 provisions the first portion of resources 110. For example, the reactive controller 126 may use the example equation (3) to determine a number of resources to provision based on the excess actual workload.
To allocate actual workload among the portions of resources 110 and 112, the example resource manager 116 of
The example coordinator 128 of
The example coordinator 128 of the illustrated example determines when a threshold is exceeded by monitoring a number of requests received. In other examples, the coordinator 128 monitors an amount of bandwidth requested, an amount of bandwidth consumed, an amount of data transferred, and/or an amount of processor capacity being utilized. When an actual workload drops to below a threshold, the example coordinator 128 migrates workload from the second portion of resources 112 to the first portion of resources 110. The example coordinator 128 then instructs the reactive controller 126 to deactivate the second portion of resources 112.
To enable data center operators (and/or other personnel associated with the organization 102) to interface with the resource manager 116, the example resource manager 116 of
The example resource manager interface 130 of
While an example manner of implementing the example system 100 has been illustrated in
In the example of
The example base workload forecaster 120 uses the representative workload pattern 207 to determine a number of intervals to create a base workload 208. In this example, the base workload forecaster 120 determines that the representative workload pattern 207 is to be partitioned into three time intervals 210-214 of different durations. In other examples, the base workload forecaster 120 may partition a representative data pattern into fewer or more time intervals. The example base workload forecaster 120 determines a number of time intervals in which provisioning is to occur to reduce a number of times the data center 108 is provisioned while reducing an error between the base workload 208 and the data patterns 202-206.
In the example of
In the illustrated example of
In the example of
In the illustrated example, the data patterns 202-206 and workloads 208, 220, 224, and 226 are shown during a 24 hour time period to illustrate an example base workload 208 over time. In other examples, the base workload 208 and/or the data patterns 202-206 may be for relatively longer or shorter time periods. Additionally in many examples, the actual workload 220 is received in real-time by the coordinator 128 and allocated among the portions of resources 110 and 112 without specifically knowing previous and/or future actual workloads.
The example graphs 302-306 show performances of the example resource manager 116 using the example method to provision resources among two portions of data center resources, as described herein. In the example graphs 302-306, the example method described herein is referred to as a hybrid-variable method 308. The example graphs 302-306 compare the example hybrid-variable method 308 of provisioning resources in a data center to other methods of provisioning resources. Non-limiting examples of methods to provision resources, and uses of the example methods, are described below.
The predictive 24 hours method (e.g., Predictive 24 hrs) provisions resources in the data center once every 24 hours. The predictive 6 hour method (e.g., Predictive 6 hrs) includes partitioning a 24 hour time period into four equal six hour time intervals and provisioning resources at the start of each interval. The predictive 1 hour method (e.g., Predictive 1 hrs) includes partitioning a 24 hour time period into 24 equal one hour time intervals and provisioning resources at the start of each interval. The predictive-variable method (e.g., Predictive/var) includes partitioning a 24 hour time period into a variable number of time intervals based on a day of the week and provisioning resources at the start of each interval. The reactive method (e.g., Reactive) monitors actual workload in ten minute time intervals and uses this information to provision resources for the next ten minutes. The hybrid-fixed (e.g., Hybrid/fixed) method includes partitioning a 24 hour time period into 24 equal one hour time intervals, provisioning resources at the start of each interval, and having a second portion of resources available when actual workload exceeds a base workload. These methods use a base workload for each time interval that is about 90% of a previously monitored peak workload during the respective time interval of a time period.
The example graph 302 of
The example graph 306 shows a number of provisioning changes during the five week period for each of the methods. In particular, the example hybrid-variable method has more provisioning changes than the predictive 24 hour method, the predictive 6 hour method, and the predictive-variable method, but fewer changes then the other methods. While the predictive 6 hour method and the predictive-variable method results in a data center having fewer provisioning changes than the example hybrid-variable method 308, a data center using the predictive 6 hour method and the predictive-variable method had more SLA violations than the example hybrid-variable method 308. Further, a data center using the example hybrid-variable method 308 consumed less power than a data center using the predictive 24 hour method. Thus, the example graphs 302-306 show that the example hybrid-variable method 308, utilized by the example resource manager 116 described herein, reduces SLA violations without increasing power consumption or a number of provisioning changes.
The example graph 402 illustrates an actual workload over the 24 hour time period. In this example, the actual workload varied between 0 and 2 million requests per second. The example graphs 404-408 show the performance of the methods over the 24 time period based on the actual workload in the example graph 402. The example graph 404 shows a number of servers (e.g., resources) provisioned by the example hybrid-variable method 308 compared to the other methods. The example hybrid-variable method 308 includes a base workload that is partitioned into three time intervals (e.g., 0 to about 7 hours, 7 to about 17.5 hours, and 17.5 to 24 hours). The example graph 404 shows that the example hybrid-variable method 308 has the fewest provisioning changes among the measured methods while provisioning about a same number of servers as the predictive 1 hour method and the hybrid-fixed method.
The example graph 406 shows an amount of power consumed by the methods during the 24 hour time period. Similar to the results in the example graph 402, a data center using the example hybrid-variable method 308 consumed about the same amount of power as a data center using the predictive 1 hour method and the hybrid-fixed method while having substantially fewer provisioning changes than the other methods. The example graph 408 shows a number of SLA violations per hour for the each of the methods during the 24 hour time period. Similar to the results in the example graphs 402 and 404, the example hybrid-variable method 308 resulted in about the same number of SLA violations as the predictive 1 hour method and the hybrid-fixed method while having substantially fewer provisioning changes than the other methods. Further, while the example reactive method generally used fewer servers (as shown by the graph 404) and consumed less power (as shown by the graph 406), the example reactive method had more SLA violations than the example hybrid-variable method 308.
A flowchart representative of example machine readable instructions for implementing the resource manager 116 of
As mentioned above, the example machine readable instructions of
The example machine-readable instructions 500 of
The example predictive controller 124 then implements the determined base workload for a data center (block 510). The example predictive controller 124 and/or the base workload forecaster 120 may also store the base workload to the example data pattern database 122. To provision resources, the example predictive controller 124 determines a current time and identifies a time interval that corresponds to the current time (blocks 512 and 514). The example predictive controller 124 then determines the base workload for the determined time interval (block 516).
The example predictive controller 124 then provisions a first portion of resources within the data center based on the determined base workload (block 518). The example coordinator 128 then receives an actual workload and determines if the actual workload exceeds the current base workload and/or a threshold associated with the base workload (block 520). If the base workload is exceeded, the example coordinator 128 instructs the example reactive controller 126 to provision a second portion of resources within the data center to process the actual workload that exceeds the threshold and/or the base workload (e.g., excess workload) (block 522). The example coordinator 128 and/or the predictive controller 124 then determines if a current time corresponds to (e.g., is within X minutes of) an end of the current time interval (block 524). Additionally, if the example coordinator 128 determines that there is no excess workload for the current time interval (block 520), the example coordinator 128 routes the requests for resources to the first portion of resources. The example coordinator 128 and/or the predictive controller 124 also determine if the current time corresponds to (e.g., is within X minutes of) an end of the current time interval (block 524).
If the current time interval is ending within a threshold time period (e.g., within 10 minutes. 5 minutes, 1 minute, etc), the example resource manager 116 determines if additional data patterns are to be monitored to create a new base workload (block 526). However, if the current time interval is not ending, the example coordinator 128 receives additional actual workloads and determines if the workloads exceed the base workload and/or associated threshold (block 520). If additional data patterns are to be collected, the example workload monitor 118 collects the additional data patterns to create a new representative workload pattern (block 502). If additional data patterns are not to be collected, the example predictive controller 124 determines a current time to provision the first portion of resources in the data center for the next time interval (block 512).
The example machine-readable instructions 600 of
The example predictive controller 124 next provisions a first portion of resources within the data center based on the determined base workload (block 610). The example coordinator 128 instructs the example reactive controller 126 to provision a second portion of resources within the data center to process the actual workload that exceeds the threshold and/or the base workload (e.g., excess workload) (block 612). The example workload monitor 118 may continue collecting data patterns to modify and/or adjust the representative workload pattern (blocks 602-606) based on workload changes.
The processor platform P100 of
The processor P105 is in communication with the main memory (including a ROM P120 and/or the RAM P115) via a bus P125. The RAM P115 may be implemented by dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), and/or any other type of RAM device, and ROM may be implemented by flash memory and/or any other desired type of memory device. The tangible computer-readable memory P150 may be any type of tangible computer-readable medium such as, for example, compact disk (CD), a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), and/or a memory associated with the processor P105. Access to the memory P115, the memory P120, and/or the tangible computer-medium P150 may be controlled by a memory controller.
The processor platform P100 also includes an interface circuit P130. Any type of interface standard, such as an external memory interface, serial port, general-purpose input/output, etc, may implement the interface circuit P130. One or more input devices P135 and one or more output devices P140 are connected to the interface circuit P130.
Although certain example methods, apparatus and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent either literally or under the doctrine of equivalents.