Unless otherwise indicated herein, the approaches described in this section are not admitted to be prior art by inclusion in this section.
Virtualization allows the abstraction and pooling of hardware resources to support virtual machines in a software-defined networking (SDN) environment, such as a software-defined data center (SDDC). For example, through server virtualization, virtualized computing instances such as virtual machines (VMs) running different operating systems (OSs) may be supported by the same physical machine (e.g., referred to as a host). Each virtual machine is generally provisioned with virtual resources to run an operating system and applications. The virtual resources may include central processing unit (CPU) resources, memory resources, storage resources, network resources, etc.
In many virtualized computing environments, test and development workloads may be mixed with production workloads. As an example, test and development workloads in a virtualized computing environment may be implemented as one or more VMs running software that is being debugged. Production workloads in the virtualized computing environment may be implemented as one or more other VMs that perform normal and routine day-to-day tasks/operations, such as business processes etc. Users (including system administrators) mix these two types of workloads so as to use the available resources (e.g., memory/storage capacity, processors, etc.) as optimally as possible and so as to reduce the total cost of ownership/use of the virtualized computing environment.
In some situations, a virtualized computing environment is run close to maximum capacity such that most of the resources are being utilized for the workloads.
During normal operations, running close to or at maximum capacity does not pose any significant problems. However, when a failure occurs in host(s) in the virtualized computing environment and workloads on those host(s) need to be restarted at other host(s), it is possible that insufficient resources are available to enable the workloads to be restarted at the other host(s).
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. The aspects of the present disclosure, as generally described herein, and illustrated in the drawings, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
References in the specification to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, such feature, structure, or characteristic may be effected in connection with other embodiments whether or not explicitly described.
The present disclosure addresses drawbacks described above that are associated with restarting virtualized computing instances (VCIs) such as VMs and containers on a host in a virtualized computing environment when insufficient resources are available after a failure occurs. An automatic emergency response (AER) system/method is provided that allows users to specify which virtualized computing instance(s) can be powered off in a case where insufficient resources are available within a cluster to power-on (restart) higher priority virtualized computing instances impacted by the failure. The powering off of relatively lower priority virtualized computing instances enables available resources to be provided to relatively higher priority virtualized computing instances so that such relatively higher priority virtualized computing instances can be powered on to utilize the provided resources.
To further explain the details of the AER system/method and how the AER system/method addresses the issues associated with resource availability when powering on virtualized computing instances after a failure, reference is first made herein to
In the example in
The host-A 110A includes suitable hardware 114A and virtualization software (e.g., a hypervisor-A 116A) to support various virtual machines (VMs). For example, the host-A 110A supports VM1118 . . . VMX 120. In practice, the virtualized computing environment 100 may include any number of hosts (also known as a computing devices, host computers, host devices, physical servers, server systems, physical machines, etc.), wherein each host may be supporting tens or hundreds of virtual machines. For the sake of simplicity, the details of only the single VM1118 is shown and described herein.
VM1118 may be a guest VM that includes a guest operating system (OS) 122 and one or more guest applications 124 (and their corresponding processes) that run on top of the guest OS 122. VM1118 may include other elements 138, such as agents, code and related data (including data structures), engines, etc., which will not be explained herein in further detail, for the sake of brevity.
The hypervisor-A 116A may be a software layer or component that supports the execution of multiple virtualized computing instances. The hypervisor-A 116A may run on top of a host operating system (not shown) of the host-A 110A or may run directly on hardware 114A. The hypervisor 116A maintains a mapping between underlying hardware 114A and virtual resources (depicted as virtual hardware 130) allocated to VM1118 and the other VMs. The hypervisor-A 116A may include an agent 140 that communicates with a management server 142. The agent 140 may be configured to perform, for example, reporting information associated with the host-A 110A and its VMs to a management server 142, such as identifying VMs that are running on the host-A 110A, which resources are being used by the VMs, an amount of resources that are reserved for use by the VMs, an amount of available (un-reserved) resources, etc.
Hardware 114A in turn includes suitable physical components, such as central processing unit(s) (CPU(s)) or processor(s) 132A; storage device(s) 134A; and other hardware 136A such as physical network interface controllers (NICs), storage disk(s) accessible via storage controller(s), etc. Virtual resources (e.g., the virtual hardware 130) are allocated to each virtual machine to support a guest operating system (OS) and application(s) in the virtual machine, such as the guest OS 122 and the application(s) 124 (e.g., a word processing application, accounting software, a browser, etc.) in VM1118. Corresponding to the hardware 114A, the virtual hardware 130 may include a virtual CPU, a virtual memory (including the guest memory 138), a virtual disk, a virtual network interface controller (VNIC), etc. According to various embodiments described herein, the AER system/method determines which VMs to restart and power down after a failure, based on the availability of the resources (e.g., the virtual hardware 130) that may be allocated to support the restarting of virtual machines.
The management server 142 of one embodiment can take the form of a physical computer with functionality to manage or otherwise control the operation of host-A 110A . . . host-N 110N, including determining when certain hosts have failed. In some embodiments, the functionality of the management server 142 can be implemented in a virtual appliance, for example in the form of a single-purpose VM that may be run on one of the hosts in a cluster or on a host that is not in the cluster. The functionality of the management server 142 may be accessed via one or more user devices 146 that are operated by a system administrator. For example, the user device 146 may include a web client 148 (such as a browser-based application) that provides a user interface operable by the system administrator to access the management server 142, such as for purposes of identifying failed hosts, determining resource requirements and utilization, performing the AER method (described later below, including setting restart priority levels, configuring power off settings for VMs, etc.), troubleshooting, performing security/maintenance tasks, and performing other management-related operations.
The management server 142 may be communicatively coupled to host-A 110A . . . host-N 110N (and hence communicatively coupled to the virtual machines, hypervisors, agents, GMM modules, hardware, etc.) via the physical network 112. In some embodiments, the functionality of the management server 142 may be implemented in any of host-A 110A . . . host-N 110N, instead of being provided as a separate standalone device such as depicted in
The host-A 110A . . . host-N 110N may be configured as a datacenter that is managed by the management server 142. The host-A 110A . . . host-N 110N may form a single cluster of hosts, which together are located at the same geographical site. Other deployment configurations are possible. For instance, in a stretched cluster configuration, two or more hosts are part of the same logical cluster but are located in separate geographical sites.
Depending on various implementations, one or more of the physical network 112, the management server 142, and the user device(s) 146 can comprise parts of the virtualized computing environment 100, or one or more of these elements can be external to the virtualized computing environment 100 and configured to be communicatively coupled to the virtualized computing environment 100.
When a failure occurs (e.g., the host-A 110A becomes inoperative), a high availability (HA) utility 144 attempts to power on (restart) the impacted VMs (VM1118 . . . VMX 120) on the remaining other hosts in the cluster. However, the HA utility 144 (and the respective hypervisors at these other hosts) can only restart the VMs when sufficient free (and unreserved) resources are available within the cluster to permit the restart. The agent 140 may report (to the HA utility 144) the active VMs on the host, the resource utilization of the VMs, the amount of reserved and un-reserved resources etc. For instance, the VM1118 on the failed host-A 110A might require 7 GB of memory to restart/operate. Some other and currently active host in the same cluster might have 50 GB of memory, of which 45 GB are reserved for currently running VMs on that active host and for other uses (and so unavailable for use for restarting failed VMs from failed hosts), thereby leaving 5 GB of memory as unreserved/available. This information is reported by the agent 140 to the HA utility 144. Hence, the failed VM1118 is not restarted by the HA utility 144 on that active host since there is insufficient memory resources (e.g., 5 GB) to support the 7 GB requirement of the VM1118 from the failed host-A 110A.
In some embodiments of the virtualized computing environment 100, the management server 142 can specify an amount of “performance degradation that can be tolerated”. Thus in the foregoing example, VM1118 from the failed host can be possibly restarted using only the 5 GB of memory available on the active host, if the system administrator (via the management server 142) has specified that VM1118 can be restarted and operate with less than 7 GB of memory resources and has specified that a substantial performance degradation associated with operating at that lower amount of 5 GB of memory can be tolerated. However, this feature to specify an amount of performance degradation that can be tolerated is not practical in many situations—for instance, some critical workloads have strict resource requirements that need to be met in order to restart and operate adequately after a failure. As will be further described later below, the AER system/method addresses this issue by enabling non-critical workloads to be powered off in a scenario where critical workloads cannot be restarted during a failure as a result of resource shortages.
Furthermore as an example with respect to a stretched cluster deployment configuration wherein workloads are deployed in a single logical cluster that spans two distinct geographic sites, 50% of the resources are reserved so as to guarantee that every workload can be restarted when a full site failure occurs at one of the sites. This reservation of such a large amount of resources, for allocation to workloads in the event of a failure, is inefficient since such resources generally remain unutilized unless and until a failure occurs.
Furthermore, virtual computing environments that combine test and development workloads with production workloads can result in a situation where production workloads are impacted (not available) after a failure. This is because insufficient resources are available to restart production workloads on the remaining site after a full failure is experienced at the other site, since the remaining site is continuing to run both test and development workloads and its own production workloads, and therefore has insufficient available (unreserved) resources to support restarting the production workloads that were running on the failed site.
To address the foregoing and other issues, an embodiment of the AER system/method enables users to specify which VMs can be powered off when there are insufficient resources available to restart VMs (production workloads) that are impacted as a result of a failure of one or more hosts. The VMs that can be powered off can include test and production workloads and/or other type of workloads that are deemed to be less critical relative to the production workloads to be restarted.
As an initial consideration, some virtual computing environments have a mechanism to define priority levels for restarting VMs, referred to herein as a restart priority or restart priority level. For instance, the HA utility 144 in the virtual computing system 100 can assign the following six (6) restart priority levels as listed below:
Various implementations may provide more or less (and/or different) restart priority levels than the 6 restart priority levels listed above. The restart priority levels specify the order in which VMs are to be restarted as a result of a failure, and each VM is assigned with a respective restart priority level. VMs with “highest” restart priority levels are restarted first; VMs with “lowest” restart priority levels are restarted last, and VMs with “disabled” restart priority levels are not restarted.
An example of a VM with a “highest” restart priority level is a network security workload or other type of critical workload that needs to be restarted quickly. An example of a VM with a “lowest” restart priority level is a test and production workload or other type of workload wherein restart can be delayed for a relatively longer length of time. An example of a VM with a “medium” restart priority level is a routine business workload that can be delayed but should be timely restarted. An example of a VM with a “disabled” restart priority level is a workload running obsolete applications and which is used infrequently (if at all).
However, even if such restart priority levels are in place to specify a sequence/order for VMs to restart, there is no guarantee that a critical VM (e.g., having a restart priority level of “highest”) or any other VM will be able to restart at all. For instance, such VM(s) will not be able to restart if no resources are available for the VM(s) to use, despite the fact that the VMs have a “highest” restart priority level.
Explained in another way, the HA utility 144 may have an admission control feature that sets aside (reserves) resources for use by VMs that need to restart in the event of a failure. If, however, more failures occur than the admission control feature was configured to tolerate (e.g., the number of failures exceed the capacity of the resources set aside by the admission control feature for use in restarting VMs), the result is that VMs cannot be restarted due to a lack of resources. This leads to a situation where relatively unimportant VMs are available (e.g., test and development workloads continue to run on the active hosts), but business-critical production VMs from the failed hosts are unavailable (e.g., are unable to restart on the active hosts). Therefore, the AER system/method of one embodiment solves this problem by powering off currently running VMs on the active host(s) so that the HA utility 144 can restart VMs on the active host(s) which are more important that the currently running VMs.
With the AER system/method, the user (e.g., a system administrator) uses the management server 142 to specify whether a currently running workload (VM) can be powered off or not powered off for AER when a failure occurs and VMs from the failed host(s) are to be restarted. In one example implementation, the following powering off options/settings are available for configuration into VMs by the management server 142:
In some implementations, whether a VM is assigned a power off setting of “Will Power Off” (e.g., mandatory power off), “May Power Off” (e.g., optional power off), or “Never Power Off” (e.g., mandatory keep powered on) may be based on business logic, in that the most critical workloads (e.g., VMs performing network security) can be assigned with the “Never Power Off” setting, relatively less critical workloads (VMs performing routine day-to-day business processes) can be assigned with the “May Power Off” setting, and the least critical workloads (e.g., test and development workloads/VMs) can be assigned with the “Will Power Off” setting. In some implementations, the particular power off setting assigned to some VMs can be based on preferences of the user, alternatively or additionally to being based on business logic. For instance, the user may have some reason to keep a VM operational and therefore assigns a “Never Power Off” setting to that VM, even though that VM may not necessarily be performing business-critical tasks.
According to the AER system/method, if a situation occurs where VMs from the failed host(s) cannot be restarted due a lack of available unreserved resources in the active host(s), the HA utility 144 will first power off VMs on the active host(s) that are configured with the “Will Power Off” setting. The HA utility 144 will power off as many VMs on the active host(s) as needed to be able to restart all workloads from the failed host(s) with restart priority levels from “highest” to “lowest”. After all VMs on the active host(s) that have been configured with the “Will Power Off” settings have been powered off, but there are still insufficient available resources on the active host(s) to restart all remaining workloads from the failed host(s), then the HA utility 144 will continue with powering off all VMs on the active host(s) that are configured with the “May Power Off” setting. Again, the HA utility 144 will power off as many VMs on the active host(s) as needed to be able to restart the remainder of the VMs from the failed host(s) that need to be restarted.
If there are no VMs on the active host(s) that are left to power off, the HA utility 144 issues a warning (to be viewed by the system administrator) that all VMs on the active host(s) that were allowed to power off have been powered off, and that insufficient unreserved resources are available to restart the remaining VMs from the failed host(s).
Furthermore, if a situation occurs where VMs from the failed host(s) that are configured with the “Never Power Off” setting are all restarted, by powering off a selection of VMs on the active host(s) that are configured with the “Will Power Off” setting, then the HA utility 144 attempts to restart VMs from the failed host(s) that are configured with the “May Power Off” setting by powering off VMs on the active host(s) that are configured with the “Will Power Off” setting.
In the first example scenario of
Furthermore in the first example scenario of
Also for the cluster 200, 95% of the resources are reserved to support the currently running VMs. Thus, if each host has 50 GB of resources, 95% of those resources (47.5 GB) are currently reserved and in use by the currently running VMs and 5% (2.5 GB) are unreserved/available resources. Since each host has four running VMs, this means that each running VM utilizes 47.5/4 GB=11.875 GB.
The HA utility 144 then detects that a failure has occurred at host-03206 (represented by an X placed on host-03206 in
Therefore, the HA utility 144 creates a list of active/running VMs on the active host-01202, host-02204, and host-04208 that are configured with the “Will Power Off” setting and with the “May Power Off” setting. For the active hosts shown in
The HA utility 144 will then issue a warning (which can be seen by the system administrator that accesses the management server 142 via the user device 146 in
Some further observations can be made from the first example scenario of
Also similar with respect to the first example scenario of
For the cluster 300, 60% of the resources are reserved to support the currently running VMs. Thus, if each host has 50 GB of resources, 60% of those resources (30 GB) are currently reserved and in use by the currently running VMs and 40% (20 GB) are unreserved/available resources. Since each host has four running VMs, this means that each running VM utilizes 30/4 GB=7.5 GB. So, each host has unreserved/available resources to support the restart of two additional VMs (e.g., 7.5 GB×2=15 GB, which is within the 20 GB that is available at each host).
The HA utility 144 then detects that a failure has occurred at host-02304 and host-03306 (represented by an X placed on host-02304 and host-03306 in
The HA utility 144 then powers on VMs from the failed hosts until the available resources at the active hosts can no longer support restarting additional VMs. More specifically, the HA utility 144 powers on VM6, VM8, VM9, and VM11 (which are prioritized since they are configured with the “Never Power Off” setting). Two of these VMs can be restarted at host-01302, while the other two VMs can be restarted at host-04308, thereby leaving 5 GB available at each host (e.g., 20 GB−2×7.5 GB=5 GB) after the restart.
Since the restarting of VM6, VM8, VM9, and VM11 has now effectively used up the available unreserved resources at host-01302 and host-04308, the HA utility 144 then creates a list of VMs on these active hosts that are configured as “Will Power Off” and “May Power Off”, so as to free up resources to power on the remaining VMs from the failed hosts. The HA utility 144 accordingly powers off VM1, VM3, and VM13 (which are all configured with the “Will Power Off” setting), and powers on VM12, VM5, and VM10 in their place at host-01302 and host-04308.
The HA utility 144 does not power off VM14 on host-04308 and does not power on VM7 from the failed host-02304, since both are equally configured with the “Will Power Off” setting—that is, the HA utility 144 does not favor powering off one VM for the benefit of another VM when these two VMs have equal power off settings. As such, the HA utility 144 issues a warning (via the management server 142 accessible by the system administrator) that VM7 has not been restarted due to a lack of available resources.
For instance, the previous second example scenario in
The HA utility 144 detects that a failure has occurred at host-02404 and host-03406 (shown by the X placed on these hosts in
Next, the HA utility 144 powers on the VMs with the “medium” restart priority level, which in this case are VM9 and VM11. These VMs may be powered on in sequence according to order of name (e.g., “9” comes before “11”), or they may be powered on simultaneously. These two VMs are powered on, for example, at host-04408 that has 20 GB of resources that are available, thereby leaving 5 GB available after VM9 and VM11 are restarted at host-04408. At this point, host-01402 and host-04408 no longer have sufficient available resources to enable restarting any further remaining VMs.
Therefore, the HA utility 144 creates a list of VMs on host-01402 and host-04408 that are configured with the “Will Power Off” and “May Power Off” settings. These VMs are VM1 and VM3 on host-01402 and VM13 and VM14 on host-04408, which are all configured with the “Will Power Off” setting (4 total VMs), and VM16 on host-04408 that is configured with the “May Power Off” setting.
The HA utility 144 accordingly powers off VM1, VM3, and VM13 that have the “Will Power Off” setting, and powers on VM12, VM5, and VM10 in their place at the active hosts (e.g., three VMs powered off, and three VMs powered on). One observation with these powering off/on operations is that the VMs being powered off are less prioritized (e.g., configured with the “Will Power Off” setting), as compared to the VMs being powered on (e.g., VM12 configured with the “Never Power Off” setting, and VM5 and VM10 configured with the “May Power Off” setting). Another observation is that VM12 may be powered on before VM5 and VM10, given that VM12 is prioritized due to its “Never Power Off” setting.
The HA utility 144 does not restart VM7, since VM7 is configured with the “disabled” restart priority level. The HA utility 144 can send a warning, via the management server 142, to the system administrator, if it is appropriate to notify the system administrator that VM7 has not been restarted.
According to one embodiment, the method 500 may be performed by the management server 142 and its elements (such as the HA utility 144) in cooperation with the agent 140 and/or other elements (such as hypervisors) of hosts managed by the management server 142. In other embodiments, various other elements in a computing environment may perform, individually or cooperatively, the various operations of the method 500.
At a block 502 (“DETECT OCCURRENCE OF FAILURE”), the HA utility 144 detects that one or more hosts in a cluster managed by the management server 142 has failed. For example in
The block 502 may be followed by a block 504 (“IDENTIFY VIRTUALIZED COMPUTING INSTANCES (VCIS) TO BE RESTARTED)”), wherein the HA utility 144 identifies and creates a list of VMs on the failed host(s) that potentially need to be restarted. The HA utility 144 also identifies the resource requirements of these VMs. The VMs that were running on the failed host(s) and their resource requirements may have been provided previously to the HA utility 144 by the agent 140 on the failed host(s), prior to the failure.
The block 504 may be followed by a block 506 (“RESTART (POWER ON) VCIS UNTIL INSUFFICIENT AVAILABLE RESOURCES ON ACTIVE HOST(S)”), wherein the HA utility 144 instructs the hypervisor(s) on the active host(s) to power on the VMs (which were previously running on the failed host(s)), until insufficient available resources remain on the active host(s). Referring back to the example scenarios of
The block 506 may be followed by a block 508 (“IDENTIFY ACTIVE VCIS ON ACTIVE HOST(S) THAT ARE CONFIGURED WITH WILL AND MAY POWER OFF SETTINGS”) in which the HA module 144 obtains (via the agent 140) a list of active VMs on each active host and the respective power off settings of the VMs, such as “Will Power Off”, “May Power Off”, and “Never Power Off”. The block 508 may be followed by a block 510 (“POWER OFF ACTIVE VCIS WITH WILL POWER OFF SETTING, AND POWER ON VCIS, UNTIL FIRST CONDITION(S) MET”) in which the HA utility 144 instructs the hypervisor(s) at the active host(s) to power off active VMs that have been configured with the “Will Power Off” setting, and then VMs from the failed hosts are powered on to replace the powered off VMs.
The active VMs (with the “Will Power Off” setting) are powered off and the VMs from the failed host(s) are powered on in their place at the block 510, until one or more first conditions are met. For instance, a first condition may be that a number of VMs (with the “Will Power Off” setting) are powered off until the amount of resources that they free up (e.g., become unreserved) is sufficient to enable a restart of the remaining failed VMs—the next active VM with the “Will Power Off” setting therefore does not need to be powered off. Another example of the first condition is that a number of VMs (with the “Will Power Off” setting) are powered off until the next active VM has the same/equal power off setting (e.g., the “Will Power Off” setting) as the remaining VM(s) to power on—since both of these VMs are equally configured with the “Will Power Off” setting, no action need be taken to power off one of the VMs in favor of powering on the other VM.
The block 510 may be followed by a block 512 (“REMAINING VCIS TO POWER ON?”) wherein the HA utility 144 determines whether there are any remaining VMs from the failed host(s) that need to be powered on. If there are no further VMs to power on (“NO” at the block 512), then the method 500 ends at a block 514. However, if there are further VMs from the failed host(s) that need to be powered on (“YES” at the block 512), then the method 500 proceeds to a block 516 (“POWER OFF ACTIVE VCIS WITH MAY POWER OFF SETTING, AND POWER ON VCIS, UNTIL SECOND CONDITION(S) MET”).
Specifically at the block 516 (and similar to the block 510), the HA utility 144 instructs the hypervisor(s) on the active host(s) to power off active VMs (with the “May Power Off” setting) and the VMs from the failed host(s) are powered on in their place, until one or more second conditions are met. For instance, a second condition may be that a number of VMs (with the “May Power Off” setting) are powered off until the amount of resources that they free up (e.g., become unreserved) is sufficient to enable a restart of the remaining failed VMs—the next active VM with the “May Power Off” setting therefore does not need to be powered off. Another example of the second condition is that a number of VMs (with the “May Power Off” setting) are powered off until the next active VM has the same/equal power off setting (e.g., the “May Power Off” setting) as the remaining VM(s) to power on—since both of these VMs are equally configured with the “May Power Off” setting, no action need be taken to power off one of the VMs in favor of powering on the other VM,
The block 516 may be followed by a block 518 (“REMAINING VCIS TO POWER ON?”) wherein the HA utility 144 determines whether there are any further remaining VMs from the failed host(s) that need to be powered on. If there are no further VMs to power on (“NO” at the block 518), then the method 500 ends at the block 514. However, if there are further VMs from the failed host(s) that need to be powered on (“YES” at the block 518), then the method 500 proceeds to a block 520 (“ISSUE A WARNING”) wherein the HA utility 144 issues a warning to the system administrator to indicate that there are remaining VMs that need to be restarted but are unable to be restarted, due to insufficient resources on the active host(s).
The above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The above examples may be implemented by any suitable computing device, computer system, etc. The computing device may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc. The computing device may include a non-transitory computer-readable medium having stored thereon instructions or program code that, in response to execution by the processor, cause the processor to perform processes described herein with reference to
The techniques introduced above can be implemented in special-purpose hardwired circuitry, in software and/or firmware in conjunction with programmable circuitry, or in a combination thereof. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others. The term ‘processor’ is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
Although examples of the present disclosure refer to “virtual machines,” it should be understood that a virtual machine running within a host is merely one example of a “virtualized computing instance” or “workload.” A virtualized computing instance may represent an addressable data compute node or isolated user space instance. In practice, any suitable technology may be used to provide isolated user space instances, not just hardware virtualization. Other virtualized computing instances (VCIs) may include containers (e.g., running on top of a host operating system without the need for a hypervisor or separate operating system; or implemented as an operating system level virtualization), virtual private servers, client computers, etc. The virtual machines may also be complete computation environments, containing virtual equivalents of the hardware and system software components of a physical computing system. Moreover, some embodiments may be implemented in other types of computing environments (which may not necessarily involve a virtualized computing environment), wherein it would be beneficial to power on/off certain computing elements after a failure, dependent on resource availability and priorities of the computing elements.
The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
Some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computing systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware are possible in light of this disclosure.
Software and/or other instructions to implement the techniques introduced here may be stored on a non-transitory computer-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “computer-readable storage medium”, as the term is used herein, includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant (PDA), mobile device, manufacturing tool, any device with a set of one or more processors, etc.). A computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
The drawings are only illustrations of an example, wherein the units or procedure shown in the drawings are not necessarily essential for implementing the present disclosure. The units in the device in the examples can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub-units.