Data center personnel face a growing challenge in managing multiple servers and other information technology equipment in a large data center. Multiple aspects of operation include partitions, performance, environmental measurements, and failure data which are analyzed in combination as measures of relative health of the servers.
Typical techniques for modifying a hardware configuration in a computing or electronic system are initiated from a tool in the operating system or involve physical procedures such as pulling a switch, waiting for a signal (LED) indicating permission to proceed, and physically inserting, removing, or replacing the hardware component. Both operating system tools and physical procedures assume a priori knowledge of the system under operation and are typically not trusted by users or customers. A common fear is that user error will cause a system crash.
An embodiment of a management system includes a management application that is executable in a central management station and is operative to manage hardware components one or more systems using visual graphics that presents assembly and repair functionality in combination with system health information. The management application is further operative to present a visual set of step-by-step instructions through the visual graphics for addition, deletion, and/or replacement of the managed hardware components.
Embodiments of the invention relating to both structure and method of operation may best be understood by referring to the following description and accompanying drawings:
A visual troubleshooting and diagnostic tool enables online management of add, subtract or delete, and/or replacement of hardware components. Accordingly, the visual troubleshooting and diagnostic repair tool can be used to initiate replacement and indicate input/output (I/O) card status.
Typically tools that implement online add/delete/replace functionality of components such as I/O cards and cells run under direction of an operating system and are limited by operator rights of a system. However, repair procedures and operations of modifying resources in the system are better associated with operations of an information technology (IT) administrator. The illustrative visual troubleshooting and diagnostic tool operates as an IT administrator application that enables a visual set of step-by-step procedures for troubleshooting, replacing, and exchanging hardware in the system. The functionality of online add/delete/replace (OL*) operations is better suited to an IT administrator application such as the disclosed visual troubleshooting and diagnostic tool rather than a process that resides on the operating system since control of hardware resources is thus placed in the domain of operators most capable and appropriate for hardware management.
An illustrative visual troubleshooting and diagnostic tool enables step by step instructions through a pictorial display for the replacement of hardware in a system. Furthermore, the visual troubleshooting and diagnostic tool has linkages to the system under repair and/or replace procedures to tailor steps to a particular procedure.
Referring to
In an example implementation, the visual troubleshooting and diagnostic tool displays a pictorial view of a system 108, for example with color coding, which illustrates the specific components in the system 108 along with operational information.
In the illustrative implementation, the management system 100 can further comprise the central management station 104 which is operative for executing the management application 102 and one or more workload managers 110 that are communicatively coupled to the central management station 104 and operate in combination with the central management station 104 to communicate system information. The managed systems 108 generally comprise an operating system 112, one or more hardware components 106, and a management processor 114. The management application 102 controls management of hardware components 106 at the central management level by online actions.
The management application 102 can determine status of hardware components 106 at the central management level by online actions, for example by selectively adding, deleting, and/or replacing hardware components 106 by online actions based on status.
The management application 102 forms part of a user interface that presents a visual set of step-by-step instructions to a user that enable addition, deletion, and/or replacement by forming a pictorial representation of the managed hardware components 106.
The management application 102 enables graphical, system-level views that combine the physical details of a system 108 with error and environment management information, facilitating assessment of the root cause of system failures. For example, room temperature can vary over time, causing the system to overheat and generate processor parity errors. The illustrative management application 102 can be used to assist replacement of hardware components 106 in light of environmental considerations such as temperature.
In an example embodiment, the management application 102 can be used to access information relating to environmental conditions, error status, physical details, and/or management information which are determined by the management processor 114 internal to the system 108. The management application 102 pictorially displays the accessed environmental conditions, error status, physical details, and/or management information. The management application 102 also presents or displays the visual set of step-by-step instructions through the visual graphics to enable the addition, deletion, and/or replacement of the managed hardware components 106 according to an instruction set that is specific to the respective environmental conditions, error status, physical details, and/or management information of the monitored system 108.
Examples of environmental conditions, error status, physical details, and/or management information that are accessed by the management application 102 include, but are not limited to, field replaceable unit (FRU) loading, thermal data, temperature data, air flow data, power consumption, and/or error conditions.
FRU loading, thermal data including temperature and air flow, power consumption, and errors are examples of accessible records that visual troubleshooting and diagnostic tool can display in layers.
Thus the illustrative management system 100 combines a visual tool for assembly and repair with depiction of system health information.
The management application 102 can perform many operations including online addition, deletion, and/or replacement of hardware components 106 and accessing information from a selected target system using a handshake interaction. The management application 102 can then display a pictorial slide show of steps for performing the online addition, deletion, and/or replacement operations which specify particular actions to perform and particular times and/or conditions to perform the actions.
In an example operation for pictorially representing the managed hardware components, the management application 102 can overlay selected measured and recorded data that is acquired from manageability tools onto a topographic, scalable, graphic image of selected managed hardware components. The management application 102 also enables a user to selectively zoom, pan, and/or view a selected physical representation of the managed hardware components in a graphic image. The graphic image can be displayed online or as a standalone utility.
In some implementations or in some conditions, the management application 102 can also enable a user to selectively rotate, disassemble into subassembly, and/or zoom in or out the displayed graphic image.
In some further embodiments, the management application 102 can pictorially represent the managed hardware components 106 by enabling a user to acquire measured data and a priori known information from manageability tools and construct modeled data from the acquired measured data and a priori known information, then visually displaying the modeled data.
The management system 100 can be used in combination with provisioning tools such as a workload manager for visual performance monitoring at the FRU and system level, enabling a user to identify inter-component bottlenecks and interactive management of load balancing for usage in adding, deleting, and/or replacing hardware components 106.
The add/delete/replace (OL*) functionality can be used in combination with tools for performance monitoring, workload balancing, and application partitioning of dynamic processes. Data acquired from memory, I/O, and processor sources can be used to determine interconnect performance and represented visually to enable improved tuning of the system. For example, adding memory can be indicated to improve performance when dual in-line memory modules (DIMMs) are overloaded while busses to the DIMMs are not.
Referring to
For example, the visual set of step-by-step instructions can be presented 204 for addition, deletion, and/or replacement by a pictorial representation of the managed hardware components.
Management of hardware components can be controlled at the central management level by online actions.
Referring to
Referring to
In an example embodiment, the visual set of step-by-step instructions can be presented 204 through the visual graphics for addition, deletion, and/or replacement of the managed hardware components according to an instruction set that is specific to the accessed environmental conditions, error status, physical details, and/or management information of the at least one system.
Examples of environmental conditions, error status, physical details, and/or management information that can be accessed include field replaceable unit (FRU) loading, thermal data, temperature data, air flow data, power consumption, and/or error conditions.
Referring to
Referring to
In some implementations, a user can be enabled to selectively rotate, disassemble a structure into one or more subassemblies, and/or zoom in or out the displayed graphic image.
The illustrative system and operating methods reduce the amount of user error in performing the online add/delete/replace (OL*) operation. Visual pictorial display of steps, at the time the steps are to be performed, enable an inexperienced user or technician to precisely perform correct operations, rather than relying on a priori knowledge and voluminous documentation.
Computing power can be used to present information visually to enable discoveries in diverse fields including medical imaging and scientific modeling in physics, chemistry, and biology. Visualization conveys a tremendous amount of information at one time, enabling the user to make rapid connections and interpretations. The illustrative visual troubleshooting and diagnostic tool enables the power of visualization to be applied to management of the servers.
Referring to
The illustrative visual troubleshooting and diagnostic tool can operate in accordance with a concept of overlaying various results which are measured and recorded by manageability tools onto a topographic, scalable, graphic image of a server or system. The scalable graphical image enables a user to zoom, pan, and view from various angles a physical representation of the server. The graphical image can be viewed over a web page or as a stand-alone utility. A topographic-like map of the parameters is overlaid on the image of the server. Parameters can be individually or collectively viewed. The visual representation of the parameters can be shown as absolute, relative, or as importance maps.
The visual troubleshooting and diagnostic tool enables display of environmental conditions and error states of a computer system over time, for example temperature changes and component errors, enabling complex cause and effect relationships to be more easily determined from the data.
Referring to
Various embodiments can display performance information, capacity on demand data. In some embodiments, the system can access information from provisioning tools such as workload management tools.
For example, a parameter such as temperature 330 can be represented as a rainbow of hues from red to blue representing absolute temperatures in the box. Thus a hot component is shown with a higher temperature than a cool component.
In other embodiments, conditions, or applications, a relative view can be used. A component can be shown with characteristics compared to baseline values. For example, although component A may be hotter than component B, component A may be in specification and B not in compliance. Thus, component A is shown with a “cool” color and component B with a “hot” color.
In still other examples, a parameter can be compared to ranges of importance and can thus be flagged according to OK, Warning, or Critical conditions with three colors to represent different cases.
The different display view types can be used as appropriate according to application and can be combined and/or matched for various parameters tracked in a system.
The visual troubleshooting and diagnostic tool enables rapid production and display of meaningful data that can be acted on by pointing to a physical spot in the box. The visual troubleshooting and diagnostic tool thus can reduce overhead, and usage of labels, manuals, and foreknowledge of the system.
In an example embodiment, the visual troubleshooting and diagnostic tool can be implemented by a framework of graphical presentation tools using Macromedia FLASH to present physical views of a system along with an assembly process. In another example, the visual troubleshooting and diagnostic tool can be constructed using the emerging standard of AJAX (Asynchronous Javascript technology And XML) as a basis. The FLASH-based framework enables display of a system with either bit-mapped or vector-mapped graphic images, allowing for animation, and permitting overlays of multiple images. Product images can be rotated, disassembled into subassemblies, and magnified for close inspection (zoom in/out). Animation can be added for some processes, and content can be hyperlinked to an online troubleshooting guide.
Terms “substantially”, “essentially”, or “approximately”, that may be used herein, relate to an industry-accepted tolerance to the corresponding term. Such an industry-accepted tolerance ranges from less than one percent to twenty percent and corresponds to, but is not limited to, functionality, values, process variations, sizes, operating speeds, and the like. The term “coupled”, as may be used herein, includes direct coupling and indirect coupling via another component, element, circuit, or module where, for indirect coupling, the intervening component, element, circuit, or module does not modify the information of a signal but may adjust its current level, voltage level, and/or power level. Inferred coupling, for example where one element is coupled to another element by inference, includes direct and indirect coupling between two elements in the same manner as “coupled”.
The illustrative block diagrams and flow charts depict process steps or blocks that may represent modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or steps in the process. Although the particular examples illustrate specific process steps or acts, many alternative implementations are possible and commonly made by simple design choice. Acts and steps may be executed in different order from the specific description herein, based on considerations of function, purpose, conformance to standard, legacy structure, and the like.
While the present disclosure describes various embodiments, these embodiments are to be understood as illustrative and do not limit the claim scope. Many variations, modifications, additions and improvements of the described embodiments are possible. For example, those having ordinary skill in the art will readily implement the steps necessary to provide the structures and methods disclosed herein, and will understand that the process parameters, materials, and dimensions are given by way of example only. The parameters, materials, and dimensions can be varied to achieve the desired structure as well as modifications, which are within the scope of the claims. Variations and modifications of the embodiments disclosed herein may also be made while remaining within the scope of the following claims.