Unless otherwise indicated, the subject matter described in this section is not prior art to the claims of the present application and is not admitted as being prior art by inclusion in this section.
Live migration is a virtual machine (VM) provisioning operation that enables a VM to be moved from one host system to another while the VM remains operational. Live migration provides a number of important benefits for virtual infrastructure deployments, such as the ability to dynamically load balance compute workloads across host systems and the ability to carry out proactive host maintenance with minimal VM downtime.
In a typical live migration of a VM from a source host system to a destination host system, there is a brief time window (known as the “switchover” point) during which the VM is stunned (i.e., quiesced) on the source host system and its execution state is transferred to the destination host system. This usually lasts one second or less. Once the execution state transfer is complete, a migrated copy of the VM is resumed on the destination host system.
Unfortunately, many latency-sensitive applications (e.g., voice over IP (VoIP) applications, clustered database applications, etc.) cannot gracefully handle an unexpected disruption of even one second to their runtime operation; they will often fail or become unresponsive when encountering such a disruption. As a result, live migration is generally disabled in existing virtual infrastructure deployments for VMs that run these types of guest applications.
In the following description, for purposes of explanation, numerous examples and details are set forth in order to provide an understanding of various embodiments. It will be evident, however, to one skilled in the art that certain embodiments can be practiced without some of these details or can be practiced with modifications or equivalents thereof.
Embodiments of the present disclosure are directed to techniques for implementing application-assisted VM provisioning operations, and in particular application-assisted live migration. As used herein, “application-assisted live migration” is a mechanism whereby a hypervisor of a source host system can notify a guest application that the VM within which the guest application runs will be imminently live migrated from the source host system to a destination host system, prior to actually carrying out the live migration. In response, the guest application can execute one or more remedial actions that mitigate or avoid issues which may arise with respect to its runtime operation when the VM is stunned and switched over to the destination host system. These actions can include, e.g., ensuring the completion of certain in-progress tasks, delaying the initiation of certain tasks, modifying its fault tolerance behavior, and/or others. The guest application can then return an acknowledgement message to the hypervisor upon completing the remedial actions, and the hypervisor can thereafter proceed with live migrating the VM.
With the foregoing mechanism, latency-sensitive guest applications can be made aware of—and thus gracefully prepare for—the disruption that occurs at the switchover point of a live migration event, rather than simply failing or becoming unresponsive. Accordingly, the techniques of the present disclosure advantageously allow live migration to be applied to VMs that run such applications, thereby preserving the important benefits of this virtualization feature.
Each host system 104/106 includes, in software, a hypervisor 108/110 (i.e., source hypervisor 108 and destination hypervisor 110 respectively) that provides an execution environment for running one or more VMs. Hypervisors 108 and 110 may be a bare-metal hypervisor, a hosted hypervisor, or any other type of hypervisor known in the art. In addition, source host system 104 includes a VM 112 running on top of source hypervisor 106 that comprises a latency-sensitive guest application 114. A latency-sensitive application is one that must respond very quickly (e.g., on the order of a few milliseconds or less) to events as part of its runtime operation in order to function correctly or as expected. Examples of latency-sensitive applications include VoIP applications, clustered database applications, financial (e.g., high frequency trading) applications, real-time medical imaging applications, and so on.
In
Starting with block 202 of flowchart 200, source hypervisor 108 can start a migration pre-copy phase in which source hypervisor 108 reads the guest memory pages of VM 112 and transmits the data of these guest memory pages to destination hypervisor 110 while VM 112 continues running on source host system 104. Upon receiving the data for each guest memory page, destination hypervisor 110 can write the data to a destination-side host memory (block 204). Although not explicitly shown in
At the conclusion of the pre-copy phase, source hypervisor 108 can stun VM 112 on source host system 104 and transmit a minimal subset of the VM's current execution state (e.g., CPU state and registers) to destination hypervisor 110 (block 206). Destination hypervisor 110 can then power-on migrated VM 112′ on destination host system 106, which causes VM 112′ to begin running using the execution and memory state copied over in the previous steps (and thereby “switches over” VM execution from the source side to the destination side) (block 208).
Concurrently with block 208, source hypervisor 108 can initiate a post-migration page-in phase during which source hypervisor 108 sends to destination hypervisor 110 any remaining dirty guest memory pages for VM 112 that were not copied over during the pre-copy phase (block 210). As part of this page-in phase, if the guest OS of migrated VM 112′ attempts to access a guest memory page on the destination host side that has not yet been received from source host system 104, destination hypervisor 110 can generate and send a remote page fault to source hypervisor 108 identifying that guest memory page. In response, source hypervisor 108 can immediately read and send the faulted guest memory page to destination hypervisor 110 for consumption by migrated VM 112′.
Finally, once all of the remaining dirty memory pages of VM 112 have been copied over per block 210, the live migration process is considered complete and flowchart 200 can end.
As noted in the Background section, one issue with this live migration process is that latency-sensitive applications like guest application 114 of
To address the foregoing and other similar problems,
Starting with steps (1) and (2) of
At steps (3) and (4), notification handler 304 can receive the migration start notification and pass it to latency-sensitive guest application 114, which can take one or more remedial actions in anticipation of the live migration event while notification manager 302 waits for an acknowledgement. Generally speaking, these actions can prepare the application for the disruption that will occur when VM 112 is stunned on source host system 104 and thereby avoid application failure or other undesirable outcomes. In certain embodiments, the remedial actions can include one or more of the following:
As can be seen, the specific action(s) undertaken at step (4) may differ depending on the nature of latency-sensitive guest application 114 and the potential problems being mitigated/addressed.
Upon completing the remedial actions, latency-sensitive guest application 114 can send, via notification handler 304, an acknowledgement to notification manager 302 indicating that the application is ready for the live migration to move forward (steps (5) and (6)).
Finally, at step (7), notification manager 302 can receive the acknowledgement and source hypervisor 108 (in conjunction with destination hypervisor 110) can proceed with live migrating VM 112 from source host system 104 to destination host system 106 per the conventional live migration process shown in
The remaining sections of this disclosure provide additional details for implementing the high-level workflow of
Further, although notification handler 304 is shown in
Yet further, in some embodiments VM 112 may include multiple latency-sensitive guest applications, each of which is configured to receive and process migration start notifications from notification manager 302. In these embodiments, source hypervisor 302 may delay the live migration of VM 112 until all such applications have returned an acknowledgement to notification manager 302 indicating that they are ready for the live migration to proceed. One of ordinary skill in the art will recognize other variations, modifications, and alternatives.
Starting with block 402, upon startup/initialization of latency-sensitive guest application 114, notification handler 304 can send (via, e.g., a remote procedure call (RPC) or some other communication mechanism) a registration request to notification manager 302 identifying guest application 114 and the type of VM provisioning operation for which application 114 wishes to receive notifications (i.e., live migration).
In response, notification manager 302 can create and store a registration entry indicating that guest application 114 is now subscribed to receive notifications for the live migration of VM 112 (block 404). In addition, notification manager 302 can return a registration acknowledgement to notification handler 304 comprising a notification timeout value (block 406). This value indicates the maximum amount of time that source hypervisor 108 will wait, after generating/sending out a migration start notification, before proceeding with the live migration operation. Accordingly, the notification timeout value allows latency-sensitive guest application 114 to know how much time it has to complete its remedial actions.
Finally, at block 408, notification handler 304 can communicate the notification timeout value to latency-sensitive guest application 114 and the flowchart can end.
At block 502, VI management server 102 can initiate a live migration of VM 112 from source host system 102 to destination host system 104 and send a migration start message to source host system 104.
At block 504, notification manager 302 of source hypervisor 108 can receive the migration start message, identify latency-sensitive guest application 114 as being registered to receive notifications regarding this live migration event, and send (or generate) a migration start notification for consumption by notification handler 304.
At block 506, notification handler 304 can receive (either via a pull or push mechanism) the migration start notification generated by notification manager 302 and can provide the notification to latency-sensitive guest application 114. In response, application 114 can execute its remedial actions for preparing for the live migration event (block 508). As mentioned previously, these remedial actions can include, e.g., ensuring the completion of certain tasks, quiescing certain portions of its functionality, modifying its fault tolerance behavior, etc.
Although not shown in
Once latency-sensitive guest application 114 has finished its remedial actions, it can send, via notification handler 304, an acknowledgement to notification manager 302 (block 510). In response, source and destination hypervisors 108 and 110 can proceed with live migrating VM 112 (block 512).
After some period of time, the live migration will be completed (or at least the switchover of VM 112 from source host system 104 to destination host system 106 will have been executed). At this point, a copy of notification manager 302 running on destination host system 106 can send/generate a migration end notification migrated guest application 114′ (block 514).
Finally, at blocks 516 and 518, the destination-side notification handler within migrated VM 112′ can receive and pass the migration end notification to migrated guest application 114′, which can execute steps for rolling back/reverting the remedial actions previously taken at block 508. In this way, the application can return to its normal mode of operation. In the scenario where the live migration fails and VM 112/guest application 114 needs to continue running on source host system 104, source-side notification manager 302 can send/generate the migration end notification to guest application 114, which can then roll back/revert the remedial actions on the source side.
In addition, although not shown in
Certain embodiments described herein involve a hardware abstraction layer on top of a host computer. The hardware abstraction layer allows multiple containers to share the hardware resource. These containers, isolated from each other, have at least a user application running therein. The hardware abstraction layer thus provides benefits of resource isolation and allocation among the containers. In the foregoing embodiments, virtual machines are used as an example for the containers and hypervisors as an example for the hardware abstraction layer. As described above, each virtual machine includes a guest operating system in which at least one application runs. It should be noted that these embodiments may also apply to other examples of containers, such as containers not including a guest operating system, referred to herein as “OS-less containers” (see, e.g., www.docker.com). OS-less containers implement operating system-level virtualization, wherein an abstraction layer is provided on top of the kernel of an operating system on a host computer. The abstraction layer supports multiple OS-less containers each including an application and its dependencies. Each OS-less container runs as an isolated process in user space on the host operating system and shares the kernel with other containers. The OS-less container relies on the kernel's functionality to make use of resource isolation (CPU, memory, block I/O, network, etc.) and separate namespaces and to completely isolate the application's view of the operating environments. By using OS-less containers, resources can be isolated, services restricted, and processes provisioned to have a private view of the operating system with their own process ID space, file system structure, and network interfaces. Multiple containers can share the same kernel, but each container can be constrained to only use a defined amount of resources such as CPU, memory, and I/O.
Further, certain embodiments described herein can employ various computer-implemented operations involving data stored in computer systems. For example, these operations can require physical manipulation of physical quantities—usually, though not necessarily, these quantities take the form of electrical or magnetic signals, where they (or representations of them) are capable of being stored, transferred, combined, compared, or otherwise manipulated. Such manipulations are often referred to in terms such as producing, identifying, determining, comparing, etc. Any operations described herein that form part of one or more embodiments can be useful machine operations.
Yet further, one or more embodiments can relate to a device or an apparatus for performing the foregoing operations. The apparatus can be specially constructed for specific required purposes, or it can be a generic computer system comprising one or more general purpose processors (e.g., Intel or AMD x86 processors) selectively activated or configured by program code stored in the computer system. In particular, various generic computer systems may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations. The various embodiments described herein can be practiced with other computer system configurations including handheld devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
Yet further, one or more embodiments can be implemented as one or more computer programs or as one or more computer program modules embodied in one or more non-transitory computer readable storage media. The term non-transitory computer readable storage medium refers to any storage device, based on any existing or subsequently developed technology, that can store data and/or computer programs in a non-transitory state for access by a computer system. Examples of non-transitory computer readable media include a hard drive, network attached storage (NAS), read-only memory, random-access memory, flash-based nonvolatile memory (e.g., a flash memory card or a solid state disk), persistent memory, NVMe device, a CD (Compact Disc) (e.g., CD-ROM, CD-R, CD-RW, etc.), a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The non-transitory computer readable media can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
In addition, while certain virtualization methods referenced herein have generally assumed that virtual machines present interfaces consistent with a particular hardware system, persons of ordinary skill in the art will recognize that the methods referenced can be used in conjunction with virtualizations that do not correspond directly to any particular hardware system. Virtualization systems in accordance with the various embodiments, implemented as hosted embodiments, non-hosted embodiments or as embodiments that tend to blur distinctions between the two, are all envisioned. Furthermore, certain virtualization operations can be wholly or partially implemented in hardware.
Many variations, modifications, additions, and improvements are possible, regardless the degree of virtualization. The virtualization software can therefore include components of a host, console, or guest operating system that performs virtualization functions. Plural instances can be provided for components, operations, or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the present disclosure. In general, structures and functionality presented as separate components in exemplary configurations can be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component can be implemented as separate components.
As used in the description herein and throughout the claims that follow, “a,” “an,” and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
The above description illustrates various embodiments along with examples of how aspects of particular embodiments may be implemented. These examples and embodiments should not be deemed to be the only embodiments and are presented to illustrate the flexibility and advantages of particular embodiments as defined by the following claims. Other arrangements, embodiments, implementations, and equivalents can be employed without departing from the scope hereof as defined by the claims.