Unless otherwise indicated herein, the approaches described in this section are not admitted to be prior art by inclusion in this section.
Virtualization allows the abstraction and pooling of hardware resources to support virtual machines in a software-defined networking (SDN) environment, such as a software-defined data center (SDDC). For example, through server virtualization, virtualized computing instances such as virtual machines (VMs) running different operating systems (OSs) may be supported by the same physical machine (e.g., referred to as a host). Each virtual machine is generally provisioned with virtual resources to run an operating system and applications. The virtual resources may include central processing unit (CPU) resources, memory resources, storage resources, network resources, etc.
A software-defined approach may be used to create shared storage for VMs and/or for some other types of entities, thereby providing a distributed storage system in a virtualized computing environment. Such software-defined approach virtualizes the local physical storage resources of each of the hosts and turns the storage resources into pools of storage that can be divided and accessed/used by VMs or other types of entities and their applications. The distributed storage system typically involves an arrangement of virtual storage nodes that communicate data with each other and with other devices.
It can be challenging to effectively and efficiently evaluate the health of a distributed storage system. Evaluating health issues (including determining their priority/urgency levels) can be challenging in distributed storage systems that are large-scale and deployed in a complex computing environment.
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. The aspects of the present disclosure, as generally described herein, and illustrated in the drawings, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
References in the specification to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, such feature, structure, or characteristic may be implemented in connection with other embodiments whether or not explicitly described.
The present disclosure addresses various drawbacks associated with evaluating health issues in a distributed system, such as a distributed storage system provided by a virtualized computing environment. Evaluation techniques in accordance with various embodiments categorize health issues based on at least three categories (e.g., storage data availability and accessibility, storage data performance, and storage space utilization and efficiency), and provide priority levels for the health issues within each category. In this manner, a more user-oriented approach is provided wherein in addition to identifying health issues, the priority/urgency level of the health issue(s) can be provided so as to guide the user (such as a system administrator) in determining an appropriate remedial action to perform and when such remedial action should be performed.
The embodiments provided herein enable a faster and more effective way to evaluate the overall health status of a distributed storage system, which is an important and useful capability for virtualization system administrators, technical support engineers, and other users, consumers, etc. Problems in the distributed storage system should be understood correctly so as to allow for swift and deliberate action(s) to resolve issues expediently and effectively. The monitoring of distributed storage systems can be challenging, especially at scale, as there may be many clusters of storage nodes that are comprised of large numbers of servers with local attached storage devices, all connected through a network. The embodiments disclosed herein enable health issues in such complex storage environments to be monitored, evaluated, and addressed.
In such a complex storage environment, it may be rather common to encounter many kinds of hardware/software failures or various performance spike behaviors. Hence, there typically may not be standard answer as to what is good or bad health for such distributed storage systems. Some conventional approaches just evaluate the overall health of a distributed storage system, by simply summing up all health issues that are detected (possibly with optimization approaches that introduce weight according to severity of issues). However, such conventional approaches are rather naïve and cannot be easily implemented or understood by a user (e.g., a system administrator). For example, distributed storage systems are unique in that they may frequently exhibit behaviors that are expected for the distributed storage system, but such behavior(s) could potentially be misinterpreted by the system administrator as being indicative of bad health.
As a result, the overall health status may often be determined to be below expectations, if being evaluated using conventional evaluation approaches, thereby making such conventional approaches less useful from the user perspective. Rather, what would be useful and beneficial for the user, with respect to the evaluation of the distributed storage system, would be knowing at least the following:
Various embodiments of the health evaluation techniques disclosed herein address the foregoing three questions above, in a manner that allows a system administrator or other user to easily identify and take corrective action when a health issue arises in a distributed storage system and is identified. At least the following benefits/advantages may be provided by the embodiments of the health evaluation technique:
Computing Environment with Health Evaluator
Various implementations will now be explained in more detail using
In the example in
The host-A 110A includes suitable hardware-A 114A and virtualization software (e.g., hypervisor-A 116A) to support various virtual machines (VMs). For example, the host-A 110A supports VM1118 . . . VMY 120, wherein Y (as well as N) is an integer greater than or equal to 1. In practice, the virtualized computing environment 100 may include any number of hosts (also known as “computing devices”, “host computers”, “host devices”, “physical servers”, “server systems”, “physical machines,” etc.), wherein each host may be supporting tens or hundreds of virtual machines. For the sake of simplicity, the details of only the single VM1118 are shown and described herein.
VM1118 may include a guest operating system (OS) 122 and one or more guest applications 124 (and their corresponding processes) that run on top of the guest operating system 122. VM1118 may include still further other elements 128, such as a virtual disk, agents, engines, modules, and/or other elements usable in connection with operating VM1118, including using or otherwise interacting with a distributed storage system 152.
The hypervisor-A 116A may be a software layer or component that supports the execution of multiple virtualized computing instances. The hypervisor-A 116A may run on top of a host operating system (not shown) of the host-A 110A or may run directly on hardware-A 114A. The hypervisor-A 116A maintains a mapping between underlying hardware-A 114A and virtual resources (depicted as virtual hardware 130) allocated to VM1118 and the other VMs. The hypervisor-A 116A of some implementations may include/run one or more health monitoring agents 140 to monitor for health issues in the distributed storage system 152, in the host-A 110A, in the VMs running on the host-A 110A etc.
In some implementations, the agent 140 may reside elsewhere in the host-A 110A (e.g., outside of the hypervisor-A 116A), including running in a VM in some embodiments. In still other embodiments, the agent 140 may alternatively or additionally reside in a management server 142 and/or elsewhere in the virtualized computing environment 100, so as to monitor the health of hosts, network(s), the distributed storage system 152, and/or other components in the virtualized computing environment.
The hypervisor-A 116A may include or may operate in cooperation with still further other elements 141 residing at the host-A 110A. Such other elements 141 may include drivers, agent(s), daemons, engines, virtual switches, and other types of modules/units/components that operate to support the functions of the host-A 110A and its VMs, as well as functions associated with using storage resources of the host-A 110A for distributed storage.
Hardware-A 114A includes suitable physical components, such as CPU(s) or processor(s) 132A; storage resources(s) 134A; and other hardware 136A such as memory (e.g., random access memory used by the processors 132A), physical network interface controllers (NICs) to provide network connection, storage controller(s) to access the storage resources(s) 134A, etc. Virtual resources (e.g., the virtual hardware 130) are allocated to each virtual machine to support a guest operating system (OS) and application(s) in the virtual machine, such as the guest OS 122 and the applications 124 in VM1118. Corresponding to the hardware-A 114A, the virtual hardware 130 may include a virtual CPU, a virtual memory, a virtual disk, a virtual network interface controller (VNIC), etc.
Storage resource(s) 134A may be any suitable physical storage device that is locally housed in or directly attached to host-A 110A, such as hard disk drive (HDD), solid-state drive (SSD), solid-state hybrid drive (SSHD), peripheral component interconnect (PCI) based flash storage, serial advanced technology attachment (SATA) storage, serial attached small computer system interface (SAS) storage, integrated drive electronics (IDE) disks, universal serial bus (USB) storage, etc. The corresponding storage controller may be any suitable controller, such as redundant array of independent disks (RAID) controller (e.g., RAID 1 configuration), etc.
The distributed storage system 152 may be connected to each of the host-A 110A . . . host-N 110N that belong to the same cluster of hosts. For example, the physical network 112 may support physical and logical/virtual connections between the host-A 110A . . . host-N 110N, such that their respective local storage resources (such as the storage resource(s) 134A of the host-A 110A and the corresponding storage resource(s) of each of the other hosts) can be aggregated together to form a shared pool of storage in the distributed storage system 152 that is accessible to and shared by each of the host-A 110A . . . host-N 110N, and such that virtual machines supported by these hosts may access the pool of storage to store data. In this manner, the distributed storage system 152 is shown in broken lines in
According to some implementations, two or more hosts may form a cluster of hosts that aggregate their respective storage resources to form the distributed storage system 152. The aggregated storage resources in the distributed storage system 152 may in turn be arranged as a plurality of virtual storage nodes. Other ways of clustering/arranging hosts and/or virtual storage nodes are possible in other implementations.
The management server 142 (or other network device configured as a management entity) of one embodiment can take the form of a physical computer or with functionality to manage or otherwise control the operation of host-A 110A . . . host-N 110N, including operations associated with the distributed storage system 152. In some embodiments, the functionality of the management server 142 can be implemented in a virtual appliance, for example in the form of a single-purpose VM that may be run on one of the hosts in a cluster or on a host that is not in the cluster of hosts. The management server 142 may be operable to collect usage data associated with the hosts and VMs, to configure and provision VMs, to activate or shut down VMs, to monitor health conditions and evaluate and prioritize operational issues that pertain to health, and to perform other managerial tasks associated with the operation and use of the various elements in the virtualized computing environment 100 (including managing the operation of and accesses to the distributed storage system 152).
In some embodiments, a health evaluator 154 (described in further detail with respect to
The management server 142 may be a physical computer that provides a management console and other tools that are directly or remotely accessible to a system administrator or other user. The management server 142 may be communicatively coupled to host-A 110A . . . host-N 110N (and hence communicatively coupled to the virtual machines, hypervisors, hardware, distributed storage system 152, etc.) via the physical network 112. In some embodiments, the functionality of the management server 142 may be implemented in any of host-A 110A . . . host-N 110N, instead of being provided as a separate standalone device such as depicted in
A user may operate a user device 146 to access, via the physical network 112, the functionality of VM1118 . . . VMY 120 (including operating the applications 124), using a web client 148. The user device 146 can be in the form of a computer, including desktop computers and portable computers (such as laptops and smart phones). In one embodiment, the user may be an end user or other consumer that uses services/components of VMs (e.g., the application 124) and/or the functionality of the distributed storage system 152. The user may also be a system administrator that uses the web client 148 of the user device 146 to remotely communicate with the management server 142 via a management console for purposes of performing management operations, including health-related operations pertaining to the distributed storage system 152.
Depending on various implementations, one or more of the physical network 112, the management server 142, and the user device(s) 146 can comprise parts of the virtualized computing environment 100, or one or more of these elements can be external to the virtualized computing environment 100 and configured to be communicatively coupled to the virtualized computing environment 100.
In operation, the agent 140 (also shown in
The health evaluator 154 (e.g., a service, agent, daemon, or other component) then collects (at 218) this storage health information (and/or other health information) from each of the managed hosts 210. In the embodiment depicted in
As will be described in further detail with respect to
While the foregoing has described an embodiment wherein the health evaluator 154 resides and performs its operations within the internal network 200, other embodiments may be provided wherein the health evaluator 154 resides in the external network 202. The external network 202 may include one or more computing devices 204 deployed at a cloud (e.g., a public cloud or a private cloud), for purposes of simplicity of explanation and as examples hereinafter in some of the disclosed embodiments—the computing devices 204 of other embodiments may be deployed in various types of external network arrangements that may not necessarily be arranged as a cloud environment.
In such embodiments, the health evaluator 154 (shown in broken lines at the external network 202) may receive uploaded health information (at 220) from the management server 142 and/or from some other devices within the internal network 200. The health evaluator 154 may then perform operations to identify, categorize, and prioritize health issues, based on the health information that has been uploaded at 220. The health issue information (including categorization) and priority information may then be sent to the management server 142 (at 222) for evaluation by the user via the management console.
According to various embodiments, the health evaluator 154 may provide output (based on the health information that it processes), such as:
It is understood that the number of priority levels may vary from one implementation to another, and need not be strictly organized as priority levels P0-P4. For example, some implementations may use fewer priority levels, while other implementations may use a greater number of priority levels. Moreover, the assignment of a particular priority level to a particular health issue (action item) may also vary from one implementation to another. For example, one distributed storage system may experience a particular health issue that may be deemed to be priority level P1 and therefore requires a must-have remedial action, while the same/similar health issue may be deemed to be priority level P0 in a second distributed storage system and therefore requires immediate remedial action.
A next consideration for the health evaluator 154 is how to categorize the action items with different priority levels and user-visible impacts. According to various embodiments, there may typically be three primary health-related categories for distributed storage systems:
The three categories 1-3 above may be viewed as types of key performance indicators (KPIs) or analogous type of health indicators for a distributed storage system. For instance, any health issue that affects in-use data availability and accessibility (or more generally, a data accessibility health issue) of category 1 should be considered as top priority, then followed by performance of category 2, and then space utilization and efficiency of category 3. One or more priority levels can be assigned by the health evaluator 154 to each of the three categories, so that all health issues under a given category will share the same priority level(s). It is also possible for the health evaluator 154 to assign differing priority levels to various individual health issues that are categorized within each of these performance indicators (categories).
As an example, a category can be used to determine the priority for a specific health issue by evaluating which actual impacted category that specific health issue should falls under. For instance, under normal circumstances, the high space utilization health issue under category 3 should have a lower priority level than priority level P0 that requires immediate remedial action—that is, with a high space utilization condition in the distributed storage system 152, data and/or storage space is still available and accessible albeit at a non-optimal condition. However, if storage space reaches a nearly full condition, which may cause the whole distributed storage system 152 to become inoperative with actual data availability impacts, then the priority level for the space utilization condition should be priority level P0 rather than a lower priority level.
The example method 300 may include one or more operations, functions, or actions illustrated by one or more blocks, such as blocks 302 to 310. The various blocks of the method 300 and/or of any other process(es) described herein may be combined into fewer blocks, divided into additional blocks, supplemented with further blocks, and/or eliminated based upon the desired implementation. In one embodiment, the operations of the method 300 and/or of any other process(es) described herein may be performed in a pipelined sequential manner. In other embodiments, some operations may be performed out-of-order, in parallel, etc.
The method 300 may begin at a block 302 (“OBTAIN STORAGE HEALTH INFORMATION”), wherein the health evaluator 154 receives, from one or more of the agent(s) 140, storage health information such as performance metrics and other health-related information pertaining to the distributed storage system 152. The health evaluator 154 may also receive, at the block 302, other health-related information such as health information regarding the hosts, network(s), and/or other components in the virtualized computing environment 100.
The block 302 may be followed by a block 304 (“DETECT HEALTH ISSUE AND IDENTIFY CATEGORY”), wherein based on the received health information, the health evaluator 154 may detect or otherwise determine that a health issue exists. For example, the health evaluator 154 may determine that the distributed storage system is at full capacity, one or more storage nodes or hosts are down (e.g., inaccessible), data throughput is less than expected, etc.
At the block 304, the health evaluator 154 may also identify, for each health issue, the impacted area and scope. For example, the health evaluator 154 may assign each of the detected health issues or other conditions to a particular one or more categories. Such categories may be categories 1-3 described above respectively pertaining to data accessibility/availability, data performance, space utilization/efficiency, etc.
The block 304 may be followed by a block 306 (“DETERMINE PRIORITY LEVEL”), wherein the health evaluator 154 determines the priority level to assign to each of the health issues. For example, the priority level of a health issue (and hence the priority level of the corresponding remedial action to address the health issue) may be one of the priority levels P0-P4. As previously explained above, the priority level assigned to a health issue may be based on an actual impact to an end user (e.g., no data availability/accessibility, increased latency, lower throughput, etc.).
The block 306 may be followed by a block 308 (“GENERATE SUMMARY”), wherein the health evaluator 154 generates an overall summary that may be presented to a system administrator via the management server 142. The information included in the summary may include, but not be limited to, a number of health issues detected, identification of each specific health issue, priority level P0-P4 assigned to the health issue, location of the health issue in the virtualized computing environment 100, which of the categories 1-4 under which the health issue is assigned, etc.
In some embodiments, the overall summary may be provided as part of an alert when one or more health issues are detected. It is also possible for the overall summary to be generated according to a schedule, for example hourly, daily, weekly, etc.
The block 308 may be followed by a block 310 (“RECOMMEND REMEDIAL ACTION”), wherein the health evaluator 154 in cooperation with the management server provides a recommendation for a remedial action to address the health issue and when to perform the remedial action. For example, if the health issue is a resource utilization issue indicating that a low amount of storage capacity remains available for use, the recommendation provided at the block 310 may be a recommendation to the system administrator to provision a certain amount of additional storage capacity within 24 hours. In some embodiments, the recommendations provided at the block 310 may form part of the overall summary provided at the block 308.
Further details are provided next with respect to how the health evaluator 154 detects and identifies health issues that fall within the three categories 1-3 (e.g., blocks 302-306), and reports (e.g., via the summary at the block 308) the corresponding remedial actions with corresponding priority levels.
Data Availability and Accessibility Evaluation
According to various embodiments, the health evaluator 154 (in cooperation with the management server 142) attempts to ensure that new data can be written as long as there is sufficient free storage space in the distributed storage system 152 and that old/existing data can be read from storage. To perform this task of ensuring data accessibility/availability, the health evaluator 154 uses the agents 140 to monitor the hardware health status of the host/devices that are used for storing data.
However, it may be rather complicated to evaluate data accessibility for a distributed storage system having a high availability (HA) design, since data may be split into multiple pieces or replicated as multiple copies that are stored on multiple hosts, for purposes of better performance or better fault tolerance. Hence, if there is a network partition amongst hosts (which means that some of the hosts are isolated from other hosts due to a network connectivity issue), the data may only be accessible from some specific hosts. In such a case, the health evaluator 154 determines how that stored data is accessed (or referred to) by a consumer of the data. For example, one type of distributed storage system is an object storage system whose data object is attached as a virtual disk consumed by a virtual machine (VM). Then, if the VM is placed in a host that can access the data, then there is no data availability issue. However, if the VM is placed in a host that cannot access the data, then there is a data availability issue for that VM.
Example priority levels for a data availability and accessibility issue may defined according to the following:
The method 400 may begin at a block 402 (“FOR EACH DATA OBJECT/BLOCK”) and a block 404 (“IDENTIFY THE HOSTS/DISKS USED FOR SAVING THE DATA”), wherein for each piece of data (such as a data object or a data block), the health evaluator 154 identifies the hosts and/or disks that are used for saving that data. Such information may be provided to the health evaluator 154 by the management server 142, by the distributed storage system 152, by the agents 140, and/or by other components.
The blocks 402/404 may be followed by a block 406 (“IS THERE AN OPERATIONAL ISSUE?”), wherein the health evaluator 154 determines whether the health information provided by the agents 140 indicates whether there is an operational/health issue for the hosts/disks. If there is an operational issue (“YES” at the block 406), such as one or more hosts/disks that store the data is down, then the health evaluator 154 assigns a priority level P0 to this health issue and reports (such as via a summary or alert) the priority level P0 for the data availability issue, at a block 408 (“REPORT P0 DATA AVAILABILITY ISSUE”).
If, however, there is no operation issue for the hosts/disks detected at the block 406 (“NO” at the block 406), then the method 400 proceeds to determine whether a network partition exists, at a block 410 (“IS THERE A NETWORK PARTITION?”). If the health evaluator 154 determines that there is no network partition (“NO” at the block 410), then the method 400 proceeds to determine if there are any other health issues for the hosts/disks that exist or that may be predicted, at a block 412 (“IS THERE OTHER HEALTH ISSUE?”)
If there are no other health issues (“NO” at the block 412), then the health evaluator 154 generates an output indicating that no health issues exist and that no action needs to be taken, at a block 414 (“RETURN GREEN RESULT”). If, however, other health issues are determined to exist (“YES” at the block 412), then the health evaluator 154 assigns a priority level P1 to this health issue and reports (such as via a summary or alert) the priority level P1 for the data availability issue at a block 416 (“REPORT P1 DATA AVAILABILITY ISSUE”). The data availability issue may be a performance related issue, for example, such as latency or reduced throughput.
Back at the block 410, if the health evaluator 154 determines that a network partition exists (“YES” at the block 410), then a series of operations are performed to determine whether the hosts that store the data are in the same or different partitions, whether the consumers (e.g., VMs) of the data are in the same or different hosts in the same partition, etc. Generally, if all of the consumers are able to access the data, then a lower priority level can be given to this health issue, as compared to a higher priority level condition wherein less than all of the consumers are able to access the data due to the isolation/separation caused by the network partition. This determination process is described next.
At a block 418 (“ALL HOSTS SAVING THE DATA IN SAME PARTITION?”), the health evaluator 154 determines whether all of the hosts that store the data are in the same partition. If such hosts are in different partitions (“NO” at the block 418), then such a condition results in some consumers at some hosts being able to access the data and other consumers at other hosts being unable to access the data. Accordingly, the method 400 proceeds to assign a priority level P0 to this health issue and reports (such as via a summary or alert) the priority level P0 for the data availability issue, at the block 408 (“REPORT P0 DATA AVAILABILITY ISSUE”).
However, if all of the hosts that store the data are in the same partition (“YES” at the block 418), then the health evaluator 154 identifies all of the consumers (e.g., VMs) of the data, at a block 420 (“IDENTIFY ALL CONSUMERS”). Next, the health evaluator 154 determines whether all of the consumers are in the same host in the same partition, at a block 422 (“ALL CONSUMERS IN SAME HOST IN SAME PARTITION?”). If all of the consumers are in the same host (storing the data) in the same partition (“YES” at the block 422), then such a condition results in all of these consumers being able to access the data despite the presence of the network partition. The method 400 then proceeds to assign a priority level P1 to this health issue and reports (such as via a summary or alert) the priority level P1 for the data availability issue, at a block 424 (“REPORT P1 DATA AVAILABILITY ISSUE”).
If, back at the block 422, the health evaluator 154 determines that not all of the consumers are in the same host in the same partition (“NO” at the block 422), then such a condition results in an impact in which in some consumers are able to access the data and other consumers are unable to access the data. The method 400 then proceeds to assign a priority level P0 to this health issue and reports (such as via a summary or alert) the priority level P0 for the data availability issue at the block 408 (“REPORT P0 DATA AVAILABILITY ISSUE”), so as to give this health issue a highest (immediate) priority level for performing a remedial action.
Storage Performance Evaluation
Typically, there may be two metrics (e.g., latency and throughput) that can be used for evaluating storage performance. However, throughput is often directly related to user workload so may not be a reliable indicator. Instead, various embodiments use latency as the main indicator for evaluation of the performance health of the distributed storage system 152, because the throughput issue in the distributed storage system 152 will eventually cause high latency in some way. The health evaluator 154 is thus configured to measure for any performance downgrade (latency) that needs to be addressed through a remedial action. Such an approach may involve:
However, it may be difficult in some situations to determine a proper latency threshold so as to either avoid a false negative or a false positive. There may be two typical false negative cases for latency:
For case A, the health evaluator 154 leverages the throughput metric to determine if the average latency is expected or not. For case B, the health evaluator 154 first builds the latency historical data per owner data object/block, and then checks the owner object/block distribution for all of high latency I/Os. Hence, the following two example cases may be possible:
Furthermore, since the I/O latency is impacted by I/O size, the health evaluator 154 is configured measure the latency based on I/O size. For example, the health evaluator 154 may build a respective I/O latency evaluation model for small and large I/Os (e.g., different storage can define the different small and large I/O sizes).
Example priority levels for a performance issue may defined according to the following:
With reference first to
If the average I/O latency does not exceed the threshold (“NO” at the block 504), then the health evaluator 154 generates an output indicating that no performance health issues exist and that no action needs to be taken, at a block 506 (“RETURN GREEN RESULT”).
However, if back at the block 504, the health evaluator 154 determines that the threshold has been exceeded (“YES” at the block 504) or is close to being reached, then the method 500 proceeds to a block 508 (“IS THERE THROUGHPUT DROP?”). At the block 508, the health evaluator 154 determines whether there is an obvious or otherwise significant throughput drop during the same time period. If the health evaluator 154 determines that there is a throughput drop (“YES” at the block 508), then the method 500 proceeds to assign a priority level P1 to this performance health issue and reports (such as via a summary or alert) the priority level P1 for the health issue, at a block 510 (“REPORT P1 STORAGE PERFORMANCE ISSUE”).
If, back at the block 508, the health evaluator 154 determines that there is no throughput drop (“NO” at the block 508), then the method 500 proceeds to a block 512 (“DOES THE WORKLOAD REACH/EXCEED MAX?”), wherein the health evaluator 154 determines whether the workload is close to reaching or has exceeded the maximum level of supported workload.
If the maximum supported workload is determined to not have been exceeded (“NO” at the block 512), then the method 500 repeats starting at the block 502. However, if the maximum workload size is determined to have been exceeded (“YES” at the block 512) or is close to being reached, then the method 500 proceeds to assign a priority level P2 (a relatively lower priority level) to this performance health issue and reports (such as via a summary or alert) the priority level P2 for the health issue, at a block 514 (“REPORT P2 STORAGE PERFORMANCE ISSUE”).
The method 500 then proceeds to evaluate individual I/O latency, such as shown next in
At a block 516 (“MONITOR EACH I/O AND STORE PER I/O SIZE”) in
The block 516 may be followed by a block 518 (“I/O STUCK?”), wherein the health evaluator 154 determines whether an I/O is stuck. As previously explained above, a stuck I/O can be perceived by a user as inaccessible data. As such, if the health evaluator 154 determines that the I/O is stuck (“YES” at the block 518), then the method 500 proceeds to assign a priority level P0 (an urgent priority level) to this performance health issue and reports (such as via a summary or alert) the priority level P0 for the health issue, at a block 520 (“REPORT P0 STORAGE PERFORMANCE ISSUE”).
If, however, the I/O is determined to not be stuck (“NO” at the block 518), then the method 500 proceeds to a block 522 (“I/Os with high latency detected?”). For example, the health evaluator 154 determines whether there are individual I/Os with high latency that have been continuously detected. If no such high latency I/Os are detected (“NO” at the block 522), then the health evaluator 154 generates an output indicating that no performance health issues exist and that no action needs to be taken, at a block 524 (“RETURN GREEN RESULT”).
If, however, I/Os with high latency have been continuously detected (“YES” at the block 522, then the health evaluator 154 determines whether these high latency I/Os come from random owner objects/blocks, at a block 526 (“RANDOM?”). If determined to come from random owner objects/blocks (“YES” at the block 526), then such a condition is indicative of a performance issue. As such, the method 500 proceeds to assign a priority level P1 to this performance health issue and reports (such as via a summary or alert) the priority level P1 for the health issue, at a block 528 (“REPORT P1 STORAGE PERFORMANCE ISSUE”).
If the high latency I/Os are determined to not come from random owner objects/blocks (“NO” at the block 526), then the method 500 proceeds to a block 530 (“RELATE TO WORKLOAD CHARACTERISTICS?”). At the block 530, the health evaluator 154 determines whether the high latency I/Os relate to workload characteristics. If determined to not be related to workload characteristics (“NO” at the block 530), then the method 500 proceeds to a block 532 (“KEEP MONITORING FOR NEXT CYCLE”), in which the health evaluator 154 continues monitoring the I/Os.
Otherwise if the high latency I/O relates to workload characteristics (“YES” at the block 530), the method 500 proceeds to assign a priority level P2 to this performance health issue and reports (such as via a summary or alert) the priority level P2 for the health issue, at a block 534 (“REPORT P2 STORAGE PERFORMANCE ISSUE”).
Storage Utilization and Efficiency Evaluation
According to various embodiments, the health evaluator 154 may evaluate at least the following regarding storage utilization and efficiency:
Example priority levels for a storage utilization and efficiency issue may defined according to the following:
The method 600 may begin at a block 602 (“OBTAIN STORAGE SPACE UTILIZATION INFORMATION”), wherein the health evaluator 154 obtains storage space utilization information from the agents 140. The block 602 may be followed by a block 604 (“REACHING NEARLY FULL?”), wherein the health evaluator 154 determines whether the storage utilization will reach or has reached a nearly full condition. In such a nearly full condition, the entire distributed storage system 152 may become non-operational or non-functional.
If the storage utilization is determined to be reaching the nearly full condition (“YES” at the block 604), then the method 600 proceeds to assign a priority level P0 to this storage utilization health issue and reports (such as via a summary or alert) the priority level P0 for the health issue, at a block 606 (“REPORT P0 STORAGE UTILIZATION ISSUE”).
If, however, the storage utilization is determined to not be reaching the nearly full condition (“NO” at the block 604), then the method 600 proceeds to a block 608 (“INSUFFICIENT SPACE FOR REBUILD?”). At the block 608, the health evaluator 154 determines whether the storage utilization will reach or has reached a threshold in which there is insufficient storage space to rebuild data in case a disk/host failure occurs. If the storage utilization is determined to be near such threshold (“YES” at the block 608), then the method 600 proceeds to assign a priority level P1 to this storage utilization health issue and reports (such as via a summary or alert) the priority level P1 for the health issue, at a block 610 (“REPORT P1 STORAGE UTILIZATION ISSUE”).
If, however, the storage utilization is determined to not have reached or approached the threshold (“NO” at the block 608), then the method 600 proceeds to a block 612 (“REACHING XX% FULL?”). At the block 612, the health evaluator 154 determines whether the storage utilization will reach or has reached a certain percentage level, such as 50% full. If not reached/approaching the percentage level (“NO” at the block 612), then the health evaluator 154 generates an output indicating that no storage utilization health issues exist and that no action needs to be taken, at a block 614 (“RETURN GREEN RESULT”).
If, however, the health evaluator 154 determines that the storage utilization will reach or has reached a certain percentage level (“YES” at the block 612), then the method 600 proceeds to a block 616 (“OTHER IMPROVEMENT IN EFFICIENCY?”), wherein the health evaluator 154 determines whether there is any opportunity to improve the storage space efficiency. If such opportunities are determined to be available (“YES” at the block 616), then the method 600 proceeds to assign a priority level P2 to this storage efficiency health issue and reports (such as via a summary or alert) the priority level P2 for the health issue along with a recommendation for improving storage efficiency, at a block 618 (“REPORT P2 STORAGE EFFICIENCY RECOMMENDATION”).
Otherwise if there are no opportunities to improve the storage space efficiency (“NO” at the block 616), the method 600 proceeds to assign a priority level P2 to this storage utilization health issue and reports (such as via a summary or alert) the priority level P2 for the health issue, at a block 620 (“REPORT P2 STORAGE UTILIZATION ISSUE”).
Various checks can be performed at a block 622 (“PERFORM CHECK(S)”) to determine whether the storage efficiency may be improved. For example, the health evaluator can check one or more of: whether there is a data object/block that has reserved more storage space than what is expected/needed, whether there is cold data that has not experienced any I/O for a lengthy period of time, whether storage efficiency features such as deduplication or compression has been enabled, etc.
Therefore, in accordance with the various embodiments described above, a user-oriented approach to evaluate the health of a distributed storage system is provided. Such an approach can help a system administrator or other technical support staff to easily identify an issue and take corrective action. Compared to existing solutions, the approach(es) described herein: enables evaluation of the system health based on real user impacts (e.g., the impact to the storage data as well as the application using that data), which is good fit for a large scale distributed storage system; simplifies a complicated storage system's health into categories (e.g., three categories) that are the most user friendly and useful categories; and provides a generic and systematic way of to evaluate a distributed storage system.
Computing Device
The above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The above examples may be implemented by any suitable computing device, computer system, etc. The computing device may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc. The computing device may include a non-transitory computer-readable medium having stored thereon instructions or program code that, in response to execution by the processor, cause the processor to perform processes described herein with reference to
The techniques introduced above can be implemented in special-purpose hardwired circuitry, in software and/or firmware in conjunction with programmable circuitry, or in a combination thereof. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others. The term “processor” is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
Although examples of the present disclosure refer to “virtual machines,” it should be understood that a virtual machine running within a host is merely one example of a “virtualized computing instance” or “workload.” A virtualized computing instance may represent an addressable data compute node or isolated user space instance. In practice, any suitable technology may be used to provide isolated user space instances, not just hardware virtualization. Other virtualized computing instances may include containers (e.g., running on top of a host operating system without the need for a hypervisor or separate operating system; or implemented as an operating system level virtualization), virtual private servers, client computers, etc. The virtual machines may also be complete computation environments, containing virtual equivalents of the hardware and system software components of a physical computing system. Moreover, some embodiments may be implemented in other types of computing environments (which may not necessarily involve a virtualized computing environment and/or a distributed storage system), wherein it would be beneficial to categorize and prioritize health issues based on impact.
The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
Some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computing systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware are possible in light of this disclosure.
Software and/or other computer-readable instruction to implement the techniques introduced here may be stored on a non-transitory computer-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “computer-readable storage medium”, as the term is used herein, includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant (PDA), mobile device, manufacturing tool, any device with a set of one or more processors, etc.). A computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
The drawings are only illustrations of an example, wherein the units or procedure shown in the drawings are not necessarily essential for implementing the present disclosure. The units in the device in the examples can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub-units.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2022/097304 | Jun 2022 | WO | international |
The present application claims the benefit of Patent Cooperation Treaty (PCT) Application No. PCT/CN2022/097304, filed Jun. 7, 2022, which is incorporated herein by reference.