Modern economies and business services typically run complex, dynamic, and heterogeneous Information Technology (IT) computer infrastructures. For example, computer infrastructures can include one or more servers or host devices and one or more storage arrays interconnected by communication devices, such as switches or routers. The server devices can be configured to execute one or more virtual machines (VMs) during operation. Each VM can be configured to execute or run one or more applications or workloads. Such workloads can be executed as part of on-premise (datacenter) and off-premise (public/private cloud) environments.
During operation, performance issues can affect applications executed in cloud/virtualization environments and service levels associated with the infrastructure. As a result, systems administrators utilize a variety of monitoring and management tools to detect and address these performance issues.
As provided above, conventional monitoring and management tools can provide a systems administrator with insight into the behavior of the computer infrastructure at a given time. However, these conventional tools do not provide an indication of operation of the computer environment resources, such as performance or other metrics associated with the computer infrastructure at a future time.
By contrast to conventional monitoring and management tools, embodiments of the present innovation relate to an apparatus and method of behavior forecasting in a computer infrastructure. In one arrangement, for a given attribute of a computer infrastructure, a host device is configured to generate an attribute pattern, such as a workload pattern, using Monte Carlo Simulation. The host device is further configured to arrange or stack the attribute patterns for the given attribute of the computer infrastructure to detect potential forecast issues within the computer infrastructure. Based upon the results of the detection, the host device is configured to display the forecast of the behavior of the computer infrastructure over time.
In one arrangement, the host device provides a forecast dashboard as part of a user interface which allows a systems administrator to have visibility into the behavior issues associated with the computer infrastructure. This can include the metrics of the computer infrastructure (e.g., the performance, efficiency, reliability, and capacity of the computer environment resources) in the future, such as seven days in advance, based on the current configuration of the environment. The results provided by the dashboard can allow the systems administrator to adjust the configuration of the computer infrastructure (e.g., location of the VMs, workloads, number of workloads, capacity, etc.) to minimize or prevent the forecasted issues.
Embodiments of the innovation relate to a method of behavior forecasting in a computer infrastructure. The method includes, for each computer environment resource of the computer infrastructure having a related attribute, deriving a set of clusters for the related attribute of each associated computer environment resource and detecting a learned behavior boundary associated with each set of clusters for each associated computer environment resource. The method includes combining the learned behavior boundaries associated with each set of clusters for each associated computer environment resource to generate a resulting attribute pattern associated with the computer infrastructure; applying an attribute pattern threshold to the resulting attribute pattern; and identifying a forecasted behavior of the computer infrastructure based upon the application of the attribute pattern threshold to the resulting attribute pattern.
Embodiments of the innovation relate to an apparatus for behavior forecasting in a computer infrastructure. In one arrangement, a host device includes a controller having a memory and a processor. The controller is configured to, for each computer environment resource of the computer infrastructure having a related attribute, derive a set of clusters for the related attribute of each associated computer environment resource and detect a learned behavior boundary associated with each set of clusters for each associated computer environment resource. The controller is configured to combine the learned behavior boundaries associated with each set of clusters for each associated computer environment resource to generate a resulting attribute pattern associated with the computer infrastructure. The controller is configured to apply an attribute pattern threshold to the resulting attribute pattern and identify a forecasted behavior of the computer infrastructure based upon the application of the attribute pattern threshold to the resulting attribute pattern.
The foregoing and other objects, features and advantages will be apparent from the following description of particular embodiments of the innovation, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of various embodiments of the innovation.
Embodiments of the present innovation relate to an apparatus and method of behavior forecasting in a computer infrastructure. In one arrangement, for a given attribute of a computer infrastructure, a host device is configured to generate an attribute pattern, such as a workload pattern, using Monte Carlo Simulation. The host device is further configured to arrange or stack the attribute patterns for the given attribute of the computer infrastructure to detect potential forecast issues within the computer infrastructure. Based upon the results of the detection, the host device is configured to display the forecast of the behavior of the computer infrastructure over time.
Each server device 14 can include a controller or compute hardware 20, such as a memory and processor. For example, server device 14-1 includes controller 20-1 while server device 14-N includes controller 20-N. Each controller 20 can be configured to execute one or more virtual machines 22 with each virtual machine (VM) 22 being further configured to execute or run one or more applications or workloads 23. For example, controller 20-1 can execute a first virtual machine 22-1 and a second virtual machine 22-2, each of which, in turn, is configured to execute one or more workloads 23. Each compute hardware element 20, storage device element 18, network communication device element 16, and application 23 is associated with an attribute of the computer infrastructure 11.
The host device 25 is configured as a computerized device having a controller 26, such as a memory and a processor. The host device 25 is disposed in electrical communication with one or more computer infrastructures 11, such as via a network connection, and with a display 55. In one arrangement, the host device 25 is configured to receive, via a communications port (not shown) a set of data elements 24 from at least one computer environment resource 12 of the computer infrastructure 11 where each data element 28 of the set of data elements 24 relates to an attribute of the computer environment resources 12. For example, each data element 28 can relate to the compute level (compute attributes), the network level (network attributes), the storage level (storage attributes), and/or the application or workload level (application attributes) of the computer environment resources 12. Further, each data element 28 of the set of data elements 24 can indicate a metric, such as a performance metric, efficiency metric, reliability metric, and capacity metric, associated with each attribute. For example, a data element 28 can indicate the storage capacity (e.g., an amount of available storage space), the storage performance (e.g., a utilization rate or latency), the storage reliability (e.g., a level of service), and/or the storage efficiency (e.g., an amount of waste) of each corresponding storage device 18-1 through 18-N of the computer infrastructure 11.
Further, the host device 25 is configured to forecast the behavior of the computer infrastructure 11 based upon a baseline set of data which identifies the attributes of the computer environment resources 12 as a model encapsulated in clusters. The following provides an example of the development of the baseline set of data.
During operation, and with additional reference to
While the host device 25 can receive the data elements 24 from the computer infrastructure 11 in a variety of ways, in one arrangement, the host device 25 is configured to receive the data elements 28 from the computer infrastructure 11 as part of a substantially real-time stream. By receiving the data elements 24 as a substantially real-time stream, the host device 25 can monitor activity of the computer infrastructure 11 on a substantially ongoing basis. This allows the host device 25 to detect anomalous activity associated with one or more computer environment resources 12 over time.
As the host device 25 receives the data elements, an analytics platform 27, such as associated with the controller 26 of the host device 25, is configured to generate the baseline set of data 36. In one arrangement, as the host device 25 receives the data elements 24, the host device 25 is configured to utilize a uniformity or normalization function 34 as part of the analytics platform 27 to normalize the data elements 28. Application of the uniformity function to the data elements 24 generates normalized data elements 30. For example, any number of the computer environment resources 12 can provide the data elements 24 to the host device 25 in a proprietary format. In such a case, the normalization function 34 of the host device 25 is configured to convert or normalize the data elements 24 to a standard, non-proprietary format. In another example, as the host device 25 receives the data elements 24 over time, the data elements 24 can be presented with a variety of time scales. For example, for data elements 28 received from multiple network devices 16 of the computer infrastructure 11, the latency of the devices 16 can be presented in seconds (s) or milliseconds (ms). In such an example, the normalization function 34 of the host device 25 is configured to format the data elements 28 to a common time scale.
Normalization of the data elements 24 for application of a classification function 38, such as a clustering function 40 as described below, provides equal scale for all data elements 28 and a balanced impact on the distance metric utilized by the classification function (e.g., Euclidean distance metric). Moreover, in practice, normalization of the data elements 28 tends to produce clusters that appear to be roughly spherical, a generally desirable trait for cluster-based analysis.
Next, in one arrangement, the host device 25 is configured to apply, as part of the analytics platform 27, a classification function 38 to the normalized data elements 30 to develop the baseline set of data 36. While the classification function 38 can be configured in a variety of ways, in one arrangement, the classification function 38 is configured as a semi-supervised machine learning function, such as a clustering function 40.
Clustering is the task of grouping a set of objects in such a way that objects in the same group, called a cluster, are more similar to each other than to the objects in other groups or clusters. Clustering is a conventional technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The grouping of objects into clusters can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. For example, known clustering algorithms include hierarchical clustering, centroid-based clustering (i.e., K-Means Clustering), distribution based clustering, and density based clustering. In one arrangement, based upon application of the clustering function 40, the host device 25 is configured to detect anomalies or degradation in performance as associated with the various components or attributes of the computer infrastructure 11.
With application of the clustering function 40 to the normalized data elements 30, the host device 25 stores the baseline set of data 36 as a model encapsulated in clusters. For example, the baseline set of data 36 defines and stores values for each cluster such as mean (e.g., a centroid of a cluster), standard deviation, a maximum x-axis value, a minimum x-axis value, a maximum y-axis value, a minimum y-axis value, size (e.g., the number of data points in the cluster), and a density function (e.g., the data point population density for a cluster) per object. As the maximum and minimum x-axis and y-axis values can apply to the x-axis (e.g., time) and y-axis (e.g., an attribute such as workload), the host device 25 can identify certain characteristics of the attribute, as well as the duration of the attribute, based on the height and width of the cluster.
As provided above, the host device 25 is configured to forecast the behavior of the computer infrastructure 11 based upon the baseline set of data 36 such as via a forecasting function 42. With continued reference to
As provided above, using K-means clustering, the host device 25 can develop a baseline set of data 36 which identifies the attributes of the computer environment resources 12 as a model encapsulated in trained clusters and that characterizes the behavior of computer infrastructure attributes over time. In one arrangement of the system 10, the host device 25 is configured to leverage these trained clusters to forecast the future behavior of various attributes of the computer infrastructure 11. As will be described in detail below, the host device 25 can be configured to utilize Monte Carlo simulation and stacking to develop computer infrastructure patterns and to predict when aspects of the computer infrastructure 11 (e.g., storage, compute, etc.) will reach capacity.
Each cluster identifies areas of behavior of the computer infrastructure 11 and is defined within the baseline set of data 36 by a mean or centroid, maximum x-value, minimum x-value, maximum y-value, minimum y-value, standard deviation, size (e.g., the number of data points in the cluster), and a density function (e.g., the data point population density for a cluster). However, the baseline set of data 36 does not include actual data points for each cluster. In order to derive patterns associated with various attributes of the computer infrastructure 11, such as workload, for each computer environment resource 11 of the computer infrastructure 11 having the related attribute the host device 25 is configured to derive a set of clusters 80 for the related attribute of each associated computer environment resource 12.
In one arrangement, with reference to
For example,
In one arrangement, as part of the Monte Carlo sampling, for each cluster listing of a given VM the baseline set of data 36, the host device 25 is configured to repeatedly select or sample random metric values for the VM 22-1. For example, the host device 25 is configured to generate each Monte Carlo sample for each cluster of the VM 22-1 in a given time segment by selecting a cluster from the listing of characteristic clusters as provided by the baseline set of data 36 at random and selecting or sample a value at random from the selected cluster. While any number of samples can be taken, in one arrangement, the number of samples to be used is approximately 1000.
Returning to
For example, with reference to
For example, assume the set of three clusters 80-1, 80-2, and 80-3 has sizes {20, 40, 40} respectively. During application of the Monte Carlo sampling function 44, the hos device 25 is configured to apply sampling frequencies of {0.2, 0.4, 0.4}, respectively, since the first cluster 80-1 contains ⅕ of the total points (20/100) and each of the second and third clusters 80-2, 80-3 contains ⅖ of the total points (40/100). Once these values are determined, the host device 25 utilizes a framework that facilitates sampling. For example, the host device 25 can consider each relative frequency as a probability mass on the interval [0,1]. Accordingly, if the host device 25 selects a random value from [0,1], it should correspond to the first cluster 80-1 20% of the time, meaning that it should fall on a subinterval having a width of 0.2 (i.e., this may be [0.4,0.6] or [0.25, 0.45], etc.) which is mapped to the first cluster 80-1. Similarly, the host device 25 utilizes a subinterval of width 0.4 mapped to the second cluster 80-2 and another subinterval of width of 0.4 mapped to the third cluster 80-3.
In one arrangement, the cumulative frequencies can be a running total of the relative sizes, and for the example above, consists of {0.2, 0.6, 1.0}. Note that cluster order does not matter, as long as the mapping is correct. So for the example, the host device 25 can map the first cluster 80-1, the second cluster 80-2, and the third cluster 80-3 to the cumulative frequencies {0.4, 0.6, 1.0}. The host device 25 then proceeds with the sampling by selecting a random (e.g., uniform) value x from [0,1] and selecting the cluster mapped to the frequencies.
The host device 25 can be configured to apply the Monte Carlo sampling function 44 in a variety of ways. In one example, once a cluster 80 has been selected, the host device 25 selects a value at random from the cluster 80. Since the cluster is summarized by the mean (i.e, cMean) and standard deviation (i.e., cStddev) statistics, the host device 25 can regard the metric values in the cluster 80 as having a Gaussian (e.g., normal) distribution, for example, centered at cMean with variation determined by cStddev. For example, the host device 25 can be configured to draw a random sample x from the standard normal distribution (e.g., N(0,1) which identifies a Gaussian distribution centered at 0 with stddev=1). The host device 25 can then determine the sampleby computing cStddev*x+cMean.
With continued reference to
For example, assume the host device 25 is configured to determine a workload boundary for a first VM 22-1 during a first time interval 90-1 between 12 PM and 1 PM on the first Tuesday of the month. As indicated in
The host device 25 is configured to perform this operation for the first VM 22-1 over a number of time intervals 90 associated with the VM 22-1. For example, with reference to
The host device 25 is configured to combine each boundary value 102 for each time interval 90 over all time intervals 90 across the entire timeframe (e.g., an entire month). Such a combination results in a learned behavior boundary 100, in this case a learned behavior workload boundary 100-1, for the first VM 22-1, such as illustrated by the graph 150 in
The host device 25 is configured to repeat the process of determining the learned behavior boundary 100 for a given attribute, in this case VM workload, associated with each corresponding computer environment resource 12 of the computer infrastructure 11. For example, assume the case where the computer infrastructure 11 includes a second VM 22-2. Following the process of determining Monte Carlo samples for the clusters associated with second VM 22-2, the host device 25 can develop a series of workload boundary values 102 for each corresponding time interval 90. The host device 25 combines the boundary values 102 across all time intervals 90 for a given timeframe to form a learned behavior workload, boundary 100-2 associated with the second VM 22-2, as shown in
The overall behavior of an attribute of the computer infrastructure 11 can be defined as the sum of the behaviors for a given attribute for each corresponding computer environment resource 12. Therefore, in the case where the attribute relates to VM workload behavior of the computer infrastructure 11, the host device 25 can combine or sum the VM learned behavior workload boundaries 100 to determine the overall workload behavior or attribute pattern of the computer infrastructure 11 over time. For example, with reference to the learned behavior workload boundaries 100-1, 100-2 developed for each VM workload 22-1 and 22-2 in
The host device 25 can be configured to combine the learned behavior boundaries 100 associated with a given attribute of the computer infrastructure 11 in a variety of ways. In one arrangement, as will be described below and with reference to
Initially, with reference to
In order to generate an overall attribute pattern 104, the host device 25 is configured to store the boundary elements 107 from each learned behavior workload boundary 100, 102 to an interval tree (not shown), such as stored by the host device 25 in memory. The host device 25 is further configured to store unique interval endpoints for the boundary elements 107 across all VMs into a list (not shown) that can be traversed in order, such as ascending order.
For each boundary element 107 formed by a pair of contiguous interval endpoints, the host device 25 is configured to access the interval tree to obtain a list of all boundary elements 107 that overlap a given interval 105. Based on the resulting list, the host device 25 is configured to combine (i.e., sum) the overlapping boundaries 107 for each associated computer environment resource 12 to generate corresponding threshold segments.
For example, as indicated in
In one arrangement, after generating the resulting threshold segments 103, the host device 25 is configured merge adjacent segments 103 that have the same magnitude. The resulting collection of segments constitute the resulting attribute pattern 104 (i.e., the combined workloads of the first and second VMs 22-1, 22-2) used for forecasting.
With the resulting attribute pattern 104 developed for the VMs of the computer infrastructure 11, the host device 25 is configured to apply an attribute pattern threshold 110 to the resulting attribute pattern 104 to identify a forecasted behavior of the computer infrastructure 11. In one arrangement, with continued reference to
In the case where a portion of the resulting attribute pattern 104 falls below the attribute pattern threshold 110, the host device 25 can identify a positive behavior associated with the computer infrastructure 11 for an associated timeframe. For example, based upon a comparison of the attribute pattern threshold 110 and the resulting attribute pattern 104, the host device 25 can identify the resulting attribute pattern 104 as falling below the attribute pattern threshold 110 on Tuesday between 6:00 AM and 12:00 PM. Based upon the result, the host device 25 can identify this timeframe as having no predicted issues and can provide the result as a notification 300 to a systems administrator regarding this forecasted behavior.
Further, in the case where a portion of the attribute pattern 104 falls above or exceeds the attribute pattern threshold 110, the host device 25 can identify a negative behavior associated with the computer infrastructure 11 for an associated timeframe. For example, based upon a comparison of the attribute pattern threshold 110 and the resulting attribute pattern 104, the host device 25 can identify the workload of the first and second VMs 22-1, 22-2 exceed the capacity threshold 110 on Tuesday between 12:00 PM and 1:00 PM. Based upon the result, the host device 25 can identify the computer infrastructure 11 as having excessive workload issues during this timeframe and can output this result as a notification 300 to a systems administrator regarding this forecasted behavior.
As part of the notification 300, such as illustrated in
In one arrangement, the host device 25 is configured to provide the forecast notification 300 as part of a graphical user interface (GUI) 50 which visually identifies the forecasted behavior of the computer infrastructure. For example, with reference to
In one arrangement, as illustrated in
With the development of the attribute pattern 104 and comparison to the attribute pattern threshold 110, the host device 25 can provide a systems administrator with insight into the future behavior of the computer infrastructure 11 based upon its current operation. As such, the host device 25 can improve the operation of the computer infrastructure 11 by allowing the systems administrator to adjust the operation of the computer environment resources 12 prior to the occurrence of an issue.
While various embodiments of the innovation have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the innovation as defined by the appended claims.
This patent application claims the benefit of U.S. Provisional Application No. 62/419,780, filed on Nov. 9, 2016, entitled, “Apparatus and Method of Behavior Forecasting in a Computer Infrastructure,” the contents and teachings of which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62419780 | Nov 2016 | US |