In a virtualized computing environment, the underlying computer hardware is isolated from the operating system and application software of one or more virtualized entities. The virtualized entities, referred to as virtual machines, can thereby share the hardware resources while appearing or interacting with users as individual computer systems. For example, a server can concurrently execute multiple virtual machines, whereby each of the multiple virtual machines behaves as an individual computer system but shares resources of the server with the other virtual machines.
In a virtualized computing environment, the host machine is the actual physical machine, and the guest system is the virtual machine. The host system allocates a certain amount of its physical resources to each of the virtual machines so that each virtual machine can use the allocated resources to execute applications, including operating systems (referred to as “guest operating systems”). The host system can include physical devices (such as a graphics card, a memory storage device, or a network interface device) that, when virtualized, include a corresponding virtual function for each virtual machine executing on the host system. As such, the virtual functions provide a conduit for sending and receiving data between the physical device and the virtual machines. To this end, virtualized computing environments support efficient use of computer resources, but also require careful management of those resources to ensure secure and proper operation of each of the virtual machines.
The present disclosure can be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
Part of managing virtualized computing environments involves the migration of virtual machines. Virtual machine migration refers to the process of moving a running virtual machine or application between different physical devices without disconnecting the client or the application. Memory, storage, and network connectivity of the virtual machine are transferred from the source host machine to the destination host machine.
Virtual machines can be migrated both live and offline. An offline migration suspends the guest virtual machine, and then moves an image of the virtual machine's memory from the source host machine to the destination host machine. The virtual machine is then resumed on the destination host machine and the memory used by the virtual machine on the source host machine is freed. Live migration provides the ability to move a running virtual machine between physical hosts with no interruption to service. The virtual machine remains powered on and user applications continue to run while the virtual machine is relocated to a new physical host. In the background, the virtual machine's random-access memory (“RAM”) is copied from the source host machine to the destination host machine. Storage and network connectivity are not altered. The migration process moves the virtual machine's memory, and the disk volume associated with the virtual machine is also migrated. However, existing virtual machine migration techniques save the entire virtual function context and also require re-initialization of the destination host device in order to restore the saved virtual function context. The present disclosure relates to implementing a migration of a virtual machine from a source GPU to a target GPU.
The GPU 115 is used to create visual images intended for output to a display (not shown) according to some embodiments. In some embodiments the GPU is used to provide additional or alternate functionality such as compute functionality where highly parallel computations are performed. The GPU 115 includes an internal (or on-chip) memory that includes a frame buffer and a local data store (LDS) or global data store (GDS), as well as caches, registers, or other buffers utilized by the compute units or any fix function units of the GPU. In some embodiments, the GPU 115 operates as a physical function that supports one or more virtual functions 119a-119n. The virtual environment implemented on the GPU 115 also provides virtual functions 119a-119n to other virtual components implemented on a physical machine. A single physical function implemented in the GPU 115 is used to support one or more virtual functions. The single root input/output virtualization (“SR IOV”) specification allows multiple virtual machines to share a GPU 115 interface to a single bus, such as a peripheral component interconnect express (“PCI Express”) bus. For example, the GPU 115 can use dedicated portions of a bus (not shown) to securely share a plurality of virtual functions 119a-119n using SR-IOV standards defined for a PCI Express bus.
Components access the virtual functions 119a-119n by transmitting requests over the bus. The physical function allocates the virtual functions 119a-119n to different virtual components in the physical machine on a time-sliced basis. For example, the physical function allocates a first virtual function 119a to a first virtual component in a first-time interval 123a and a second virtual function 119b to a second virtual component in a second, subsequent time interval 123b.
In some embodiments, each of the virtual functions 119a-119n shares one or more physical resources of a source computing device 105 with the physical function and other virtual functions 119a-119n. Software resources associated for data transfer are directly available to a respective one of the virtual functions 119a-119n during a specific time slice 123a-123n and are isolated from use by the other virtual functions 119a-119n or the physical function.
Various embodiments of the present disclosure facilitate a migration of virtual machines 121a-121n from the source computing device 105 to another by transferring states associated with at least one virtual function 119a-119n from a source GPU 115 to a destination GPU (not shown) where the migration of the state associated with at least one virtual function 119a-119n involves only the transfer of data required for re-initialization of the respective one of the virtual functions 119a-119n at the destination GPU. For example, in some embodiments, a source computing device is configured to execute a plurality of virtual machines 121a-121n. In this exemplary embodiment, each of the plurality of virtual machines 121a-121n is associated with at least one virtual function 119a-119n. In response to receiving a migration request, the source computing device 105 is configured to save a state associated with a preempted one of the virtual functions 119a-119n for transfer to a destination computing device (not shown). In some embodiments the state associated with the preempted virtual function 119a is a subset of a plurality of states associated with the plurality of virtual machines 121a-121n.
For example, when a respective one of the virtual machines 121a-121n is being executed on a source computing device 105 associated with a GPU 115 and a migration request is initiated, the GPU 115 is instructed, in response the migration request, to identify and preempt the respective one of the virtual functions 119a-119n executing during the time interval 123a in which the migration request occurred, and save the context associated with the preempted virtual function 119a. For example, the context associated with the preempted virtual function 119a includes a point indicating where a command is stopped prior to completion of the command's execution, a status associated with the preempted virtual function 119a, a status associated with the interrupted command, and a point associated with resuming the command (i.e., information critical for the engine to restart). In some embodiments, the data saved includes location of command buffer in the state of last command being executed, prior to the command's completion, and the metadata location in order to continue once the command is resumed. In some embodiments, this information also includes certain engine states associate with the GPU 115 and the location of other context information.
Once the context information associated with the preempted virtual function 119a is saved and the migration begins, a host driver (not shown) instructs the GPU 115 to extract the saved information (including information such as, for example, context saving area, context associated with the virtual function 119a, engine context). Saved information also includes metadata that was saved into internal SRAM and system memory related to the command buffer and subsequent engine execution information (i.e., information relating to the execution of subsequent commands or instructions for continued execution after resuming the preempted virtual function 119a).
The context information associated with the preempted virtual function 119a is then transferred into the internal SRAM associated with the host destination GPU (not shown). The extracted data is restored iteratively at the destination host GPU. The host performs an initialization to initialize the virtual function 119a at the destination GPU to be in the same state as the host source GPU 115 to be executable. The state associated with the virtual function 119a is restored to the destination host GPU using the extracted data and the GPU engine associated with the destination host GPU is instructed to continue execution from the point at which the virtual function 119a was interrupted. Accordingly, various embodiments of the present disclosure provide migration of states associated with virtual functions 119a-119n from a source GPU 115 to a destination GPU without the requirement of saving entire all contexts associated with each of the virtual functions 119a-119n to memory before migration, thereby increasing migration speed and reducing migration overhead associated with the migration of virtual machines 121a-121n from one host computing device 105 to another.
The source machine 201 implements a hypervisor (not shown) for the physical function 203. Some embodiments of the physical function 203 support multiple virtual functions 119a-119n. A hypervisor launches one or more virtual machines 121a-121n for execution on a physical resource such as the GPU 115 that supports the physical function 203. The virtual function 119a-119n are assigned to a corresponding virtual machines 121a-121n. In the illustrated embodiment, the virtual function 119a is assigned to the virtual machine 121a, the virtual function 119b is assigned to the virtual machine 121b, and the virtual function 119n is assigned to the virtual machine 121n. The virtual functions 119a-119n then serve as the GPU and provide GPU functionality to the corresponding virtual machines 121a-121n. The virtualized GPU is therefore shared across many virtual machines 121a-121n. In some embodiments, time slicing and context switching are used to provide fair access to the virtual function 119a-119n, as described further herein.
The migration system 200 can detect and extract the command stop point associated with a preempted command. Upon receipt of a migration request, the source GPU is configured to extract a set of information corresponding to the state of the preempted virtual function 119a. For example, when a virtual function 119a is executing and migration is started, the GPU is instructed to preempt the virtual function 119a and save the context of the virtual function 119a at the point of execution corresponding to where the command is paused or interrupted, the status associated with the virtual function 119a, the status of the preempted command, and information associated with resuming an execution of the interrupted command (i.e., information critical for the engine to restart). Saved information also includes metadata that was saved into cache 219 and system memory related to the command buffer 217, register data 221, information in the system memory 223 and subsequent engine execution information (i.e., information relating to the execution of subsequent commands or instructions for continued execution after resuming the preempted virtual function). For example, the saved information can be associated with the interrupted command and a subsequent command. This information is transferred into a memory such as a cache.
Once the data required for resuming the interrupted command associated with the virtual function 119a at the source computing device 201 is saved and the migration is initiated, the host driver instructs the GPU to extract all of the saved information and transfer only the data required to re-initialize the virtual function 119a to the destination machine 205. The destination machine 205 is associated with a corresponding physical function 204. The extracted data is then restored iteratively into the destination machine 205. The destination machine 205 performs an initialization to initialize a virtual function 119t at the destination machine 205 to be in the same state as the source machine 201 to be executable. The virtual function state is restored to the destination machine 205 using the extracted data and a command is issued to the GPU engine to continue execution from the point at which the command associated with the preempted virtual function 119a was interrupted in the source machine 201.
Referring next to
The flowchart of
Beginning with block 403, when the migration system 200 (
A computer readable storage medium includes any non-transitory storage medium, or combination of non-transitory storage media, accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc, magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media. The computer readable storage medium is embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
In some embodiments, certain aspects of the techniques described above are implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium are in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device is not required, and that one or more further activities are performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter can be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above can be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.
Number | Date | Country | Kind |
---|---|---|---|
201910098169.8 | Jan 2019 | CN | national |