One or more embodiments of the present invention relate to method and apparatus for visualizing the health of datacenter objects.
As a datacenter' s virtual infrastructure grows in size and encompasses more and more objects, for example, hardware and virtual machines, the ability to monitor the health of the objects in the virtual infrastructure becomes more and more difficult. Monitoring each object individually, as is typically done today, is no longer a viable option. In addition, existing monitoring solutions, are not scalable, and are difficult to interpret quickly.
Existing monitoring solutions provide dashboards comprised of lengthy lists of “Top” items, much like a financial stock listing. To use the lists, a user must identify a number of items to follow, or identify a value type of interest, and continually scan a flat list to try to understanding what is occurring. While this method may be useful for a small environment or a small list of items, it may quickly encounter a scalability issue. This is because the many lists displayed are often difficult to read at a glance. Instead, they require the user to scan and read each line, and even to scroll the page. Worse still, items that are continually in the top of a list no longer demand attention the next time a user looks at the list.
One or more embodiments of the present invention solve one or more of the above-identified problems by providing method and apparatus for visualizing the health of datacenter objects. Specifically, one or more embodiments of the present invention provide a dashboard that displays an overview of a datacenter' s health which helps prioritize, monitor, and troubleshoot problems. In particular, one embodiment of the present invention is a method for visualizing the health of datacenter objects which comprises displaying datacenter objects on a scatterplot of a dashboard wherein one axis of the scatterplot corresponds to problem severity and another axis of the scatterplot corresponds to time.
Three components to prioritizing a problem relating to an object in a datacenter are: (a) determining the importance of the object (i.e., how critical is the problematic object to datacenter operation?); (b) determining the problem's severity (for example and without limitation, is this a total or partial outage? and how many related problems exist?); and (c) determining the problem's duration.
One or more embodiments of the present invention provide method and apparatus that help: (a) identify and prioritize problems to be investigated, and (b) identify objects that may be impacted by a particular problem (such impacted objects are referred to herein as “related objects”). In particular, one or more embodiments of the present invention are method and apparatus for generating interactive visualizations that aid a user, for example and without limitation, a datacenter administrator, to prioritize, monitor and troubleshoot problems.
Apparatus for Data Relating to Problems
DMS 110 analyzes the data and identifies health problems relating to the objects. For example and without limitation, DMS 110: (a) may determine how much CPU and memory is allocated to VMs of a server; (b) may compare the data with performance metrics relating, for example and without limitation, to CPU, memory, disk and network performance; and (c) may monitor the hardware health of servers, including components such as, for example and without limitation, fans, system boards, and power supplies. In addition, DMS 110 may utilize customizable alarm triggers in monitoring objects to provide notification when critical error conditions occur.
DMS 110 running on datacenter management server 104 may periodically query each of its managed objects to retrieve the health-related data. To do this, for example and without limitation, DMS 110 may use one or more agents (for example agent 106 shown in
In accordance with one or more embodiments of the present invention, visualization software (“VS”) that provides inventive visualization and functionality associated therewith runs in a computer system that may be inside or outside datacenter 100. For example and without limitation, as shown in
In accordance with one or more embodiments of the present invention, and using any one of a number of methods that are well known to those of ordinary skill in the art, VS 120 accesses data in database 108: (a) that indicates objects in one or more datacenters, for example, datacenter 100, that are unhealthy (along with identifying information such as, for example and without limitation, the type of object); (b) for the unhealthy objects, the data includes alert information with a timestamp indicating when the problem started, for example, when one or more performance metrics deviated from configurable norms by more than configurable tolerances; and (c) data indicating objects that are related to the unhealthy object (for example and without limitation, VMs that access a particular datastore would be considered related). In accordance with one or more embodiments of the present invention, and using any one of a number of methods that are well known to those of ordinary skill in the art, VS 120 accesses database 104 periodically, where the periodicity can be varied in response to user input in accordance with any one of a number of methods that are well known to those of ordinary skill in the art. In addition, and in accordance with one or more further embodiments, for a number of predetermined types of problems (which predetermined problem types are configurable in accordance with any one of a number of methods that are well known to those of ordinary skill in the art), DMS 110 sends an alarm to VS 120 which, in response to the alarm, accesses database 108 to retrieve information relating to a potentially critical problem.
In accordance with one or more embodiments of the present invention, a user connects to VS 120 over a private network connection using a browser in accordance with any one of a number of methods that are well known to those of ordinary skill in the art. In response, and in accordance with one or more such embodiments, VS 120 interacts with the user through a user interface (UI) in accordance with any one of a number of methods that are well known to those of ordinary skill in the art. In accordance with one or more such embodiments, the user uses a computing resource with a display (shown in
Visualization
One or more embodiments of the present invention are method and apparatus for mapping object importance, problem severity, and problem duration on a dashboard that includes a scatterplot. As used herein, and as used in the art, a dashboard is a term that generally refers to a display on which real time information is collated from various sources, for example and without limitation, in a datacenter. The metaphor of a dashboard is adopted here to emphasize the nature of the data being displayed on a page; it is a real-time analysis as to how a datacenter is operating, just like on an automobile, dashboard real time information is displayed about the performance of that vehicle.
In accordance with one or more embodiments of the present invention, the horizontal axis corresponds to time, and the objects are first displayed at a time the object was determined to be unhealthy. Thus, the position of an object vis-a-vis the horizontal axis may provide a measure of problem duration or age (for example and without limitation, as shown in
In accordance with one or more embodiments of the present invention, and as shown in
In accordance with one or more embodiments of the present invention, when a user selects an object with a cursor (for example and without limitation, by clicking a mouse when the cursor is over the object) that is provided using any one of a number of methods that are well known to those of ordinary skill in the art, an indication (for example and without limitation, a point) is placed on the time slider using any one of a number of methods that are well known to those of ordinary skill in the art, which indication identifies when the problem with the object was first detected. In accordance with one or more embodiments of the present invention, the user may move a time indicator (for example and without limitation, time indicator 210 shown in
In accordance with one or more embodiments of the present invention, a “time player” acts to replay the scatterplot in accordance with any one of a number of methods that are well known to those of ordinary skill in the art, for example and without limitation, from the problem start time to the present time for a selected object.
In accordance with one or more embodiments of the present invention, the type of problem objects displayed may be selected so as to provide a filter for the display. For example and without limitation, as shown in
In accordance with one or more embodiments of the present invention, another indicator of priority (object importance) is represented on the scatterplot by the size of a display object on the plot (for example and without limitation, higher importance problems are larger in size). For example and without limitation, as a configurable feature (typically initialized at installation using any one of a number of methods that are well known to those of ordinary skill in the art), the user would enter data, for example and without limitation, into a table in accordance with any one of a number of methods that are well known to those of ordinary skill in the art that associates problems of a particular type with a particular type and size of display object. For example and without limitation, a display object may be a circle, a rectangle, a hexagon, a triangle and so forth. In accordance with one or more further embodiments of the present invention, the type of display object or the color of a display object on the scatterplot may relate to the type of object, for example and without limitation, a virtual machine (VM) or a datastore while, as set forth above, object importance relates to size. In accordance with one or more such further embodiments, the type and/or the color of a display object are configurable parameters.
In accordance with one or more further embodiments, and using any one of a number of methods that are well known to those of ordinary skill in the art, the display objects may comprise text labels which have the name of a problematic object—in a particular case. By replacing points or dots with text labels, users can quickly identify each problem, in context. This also saves the number of drilldown steps needed to identify problems, and provides a better overview of the present situation. However, if the number of objects increases to such an extent that the text overlaps so much that it obscures matters, the text labels can be replaced by more scalable points. For example and without limitation, as shown in
In accordance with one or more embodiments of the present invention, visualization of relationships among objects is created when a user selects an object on the scatterplot (for example and without limitation, by clicking a mouse when a cursor is over the object) by providing directed edges on the scatterplot from the selected object to related objects using any one of a number of methods that are well known to those of ordinary skill in the art (i.e., as long as the related objects are displayed on the scatterplot in a time duration encompassed thereby). For example and without limitation, types of objects that are related may be configurable—typically this data is initialized at installation using any one of a number of methods that are well known to those of ordinary skill in the art, and a user could change this data, for example and without limitation, using the UI in accordance with any one of a number of methods that are well known to those of ordinary skill in the art.
In accordance with one or more further embodiments of the present invention, the user may, for example and without limitation, use a cursor to hover over an object, and in response, and in accordance with one or more such embodiments, additional information is displayed (referred to herein as a “tooltip”) using any one of a number of methods that are well known to those of ordinary skill in the art. In accordance with one or more such embodiments, the additional information in the display tooltip may include problem metrics obtained from, for example and without limitation, database 108. For example and without limitation, the metrics to be displayed for a particular type of object would be configurable parameters that would be determined at installation and would be configurable by a user using the UI in accordance with any one of a number of methods that are well known to those of ordinary skill in the art. For example and without limitation, as shown in
In accordance with one or more embodiments of the present invention, a background of the scatterplot may be divided into several regions where the backgrounds are displayed in different colors using any one of a number of methods that are well known to those of ordinary skill in the art, which areas correspond to different degrees of problem severity. In accordance with one or more such embodiments, the several regions are displayed in one background color but having different intensities in the different regions. Alternatively, the background could reflect an intensity gradient over the scatterplot. In further addition, one or more embodiments are combinations of one or more of the foregoing. For example and without limitation, as shown in
In accordance with one or more embodiments of the present invention, a “stacked chart” is added to the display shown in
In accordance with one or more embodiments of the present invention, a list of alerts is displayed along with the scatterplot using any one of a number of methods that are well known to those of ordinary skill in the art, which list of alerts details the specific problems afflicting the displayed objects. For example and without limitation,
In accordance with one or more embodiments of the present invention, the scatterplot is displayed in conjunction with one or more toolbars using any one of a number of methods that are well known to those of ordinary skill in the art. In accordance with one or more such embodiments, one toolbar may be used to change the visualization type using any one of a number of methods that are well known to those of ordinary skill in the art (for example and without limitation, objects are displayed using text (and color of the text), objects are displayed as particular shapes such as, for example and without limitation, circle, rectangle, hexagon, triangle (and the color of the shapes), and one toolbar may be used to hide or reveal objects by type using any one of a number of methods that are well known to those of ordinary skill in the art (for example and without limitation, a list of objects is displayed and a checkbox is used to indicate which of the objects are to be displayed in the scatterplot). For example and without limitation, refer to top right-hand panel 410 of
In accordance with one or more embodiments of the present invention, a user can filter the display, for example and without limitation, by object type, problem severity, text used to identify objects and so forth. In accordance with one or more such embodiments, filter criteria may be specified by a user using the UI in accordance with any one of a number of methods that are well known to those of ordinary skill in the art. Further, in accordance with one or more such embodiments, the filtering can be dynamic so that the display reflects criterion matching results as soon as the user enters a search criterion as opposed to having to enter all filter criteria (and selecting a “submit button”) before the display reflects criteria matching results. In addition, in accordance with one or more such embodiments, filter criteria may be specified so as to apply to tabular data (such as alerts) displayed, for example and without limitation, below the scatterplot. For such cases, the tabular display would be “in sync” with the scatterplot, i.e., if the user filters something out of the scatterplot, it would also be filtered out of the tabular display.
In accordance with one or more embodiments of the present invention, a scatterplot is updated when one or more of the following occurs: a problem with an object is detected, a problem severity changes, a problem severity falls below a user-configurable value (for example and without limitation, the object becomes healthy), time changes by a user-configurable amount (for example and without limitation, the amount may equal a time interval along the time axis of the scatterplot), a time indicator on a time slider is moved, or a problem importance changes.
Method of Using the Visualization
The following illustrates a method of using the inventive visualization to detect and resolve a problem in a datacenter. In other words, it shows how a datacenter administrator can detect a problem, and explore the problem space using a display with a scatterplot and a list of alerts.
Consider the following problem scenario where resources in three server clusters, including a production server cluster (i.e., a group of servers that are running VMs used to serve customers in production) are impacted. The underlying cause of the problem is that a misconfigured storage array was overloaded due to a sudden spike in traffic from multiple sources.
A first step in the method entails detecting the problem. To do this, and in accordance with one or more embodiments of the present invention, the datacenter administrator accesses VS 120 and, in response, VS 120 provides a display (refer to
As shown in the example of
The next step may entail obtaining more information regarding datastore-32. To do this, in accordance with one or more embodiments of the present invention, the datacenter administrator uses a cursor provided by VS 120 in accordance with any one of a number of methods that are well known to those of ordinary skill in the art to select an object, in this case datastore-32, by clicking a mouse when the cursor appears over the object. In response, (as described above and as shown in
The next step may entail further exploring the problem space. To do this, and in accordance with one or more embodiments of the present invention, the datacenter administrator hovers over the other objects to obtain a tooltips displaying more information regarding other objects in the scatterplot.
The next step may entail “rewinding” to a time at which the problem was first detected. To do this, in accordance with one or more embodiments of the present invention, the datacenter administrator uses a time slider on the display underneath the scatterplot. We can also use the stacked chart to identify areas/times of where problems first appeared by looking for changes in stacked chart. Next, in accordance with one or more embodiments of the present invention, the datacenter administrator drags the slider time indicator to the left, and VS 120 shifts the whole “cloud” of objects right, and hence, back in time (as described above and shown in
The next step may entail replaying to verify the cause of the problem and its impact. As shown in
The next step may entail drilling down to further understand the problem. For example and without limitation, the datacenter administrator may use DMS 110 to: (a) determine who the owners of the VMs are, and then notify the owners to investigate what might have caused the traffic increase as described above; and (b) drill down to the storage array configuration and redistribute the VMs among several other datastores to reduce the load in accordance with any one of a number of methods that are well known to those of ordinary skill in the art.
One or more embodiments of the present invention, including embodiments described herein, may employ various computer-implemented operations involving data stored in computer systems. For example. these operations may require physical manipulation of physical quantities usually, though not necessarily, these quantities may take the form of electrical or magnetic signals where they, or representations of them, are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing. identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the invention may be useful machine operations. In addition, one or more embodiments of the invention also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
One or more embodiments of the present invention, including embodiments described herein, may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
One or more embodiments of the present invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system. Computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory. random-access memory (e.g., a flash memory device), a CD (Compact Discs) CD-ROM, a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
Although one or more embodiments of the present invention have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Many changes and modifications may be made to the description set forth above by those of ordinary skill in the art while remaining within the scope of the invention. In addition, apparatus, methods and mechanisms suitable for fabricating one or more embodiments of the present invention have been described above by providing specific, non-limiting examples and/or by relying on the knowledge of one of ordinary skill in the art. Apparatus, methods, and mechanisms suitable for fabricating various embodiments or portions of various embodiments of the present invention described above have not been repeated, for sake of brevity, wherever it should be well understood by those of ordinary skill in the art that the various embodiments or portions of the various embodiments could be fabricated utilizing the same or similar previously described apparatus, methods and mechanisms.
As such, the scope of the invention should be determined with reference to the appended claims along with their full scope of equivalents. Accordingly, the described embodiments are to be considered as exemplary and illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein, but may be modified within the scope and equivalents of the claims. The claim elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
Many changes and modifications may be made to the description set forth above by those of ordinary skill in the art while remaining within the scope of the invention. In addition, methods, and mechanisms suitable for fabricating embodiments of the present invention have been described above by providing specific, non-limiting examples and/or by relying on the knowledge of one of ordinary skill in the art. Methods, and mechanisms suitable for fabricating various embodiments or portions of various embodiments of the present invention described above have not been repeated, for sake of brevity, wherever it should be well understood by those of ordinary skill in the art that the various embodiments or portions of the various embodiments could be fabricated utilizing the same or similar previously described materials, methods or mechanisms. As such, the scope of the invention should be determined with reference to the appended claims along with their full scope of equivalents.
In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claims(s).