Unless otherwise indicated, the subject matter described in this section is not prior art to the claims of the present application and is not admitted as being prior art by inclusion in this section.
Hypervisors have traditionally been run entirely in the most privileged central processing unit (CPU) execution context (often referred to as kernel mode because operating system kernels also run in such a mode). Although this was originally done out of necessity, in recent years there has been growing interest in running hypervisors in a less privileged CPU execution context (often referred to as user mode). There are several driving forces for this change. First, it results in a reduced attack surface, as any code running in kernel mode can potentially become a vector for attack. Second, a large number of tasks performed by hypervisors do not require privileged access to system resources and thus can easily be run in user mode. Third, by moving towards a user level hypervisor, certain software simplifications can be achieved within the hypervisor that enable faster and more efficient operation.
One challenge with implementing a user level hypervisor on existing CPU architectures is that, due to the particular hardware virtualization execution modes and mechanisms for controlling those modes supported by such CPUs, there is a significant increase in transitions between more privileged and less privileged CPU execution contexts. This increase in transitions across different privilege levels can lead to noticeable drops in performance.
In the following description, for purposes of explanation, numerous examples and details are set forth in order to provide an understanding of various embodiments. It will be evident, however, to one skilled in the art that certain embodiments can be practiced without some of these details or can be practiced with modifications or equivalents thereof.
Embodiments of the present disclosure are directed to new CPU hardware features and associated workflows that advantageously reduce the number of CPU execution context transitions across privilege levels necessitated by a user level hypervisor (or in other words, a hypervisor that runs in user mode). As used herein, a “CPU execution context” is a runtime state of a CPU, including its program counter and all of its registers, and is associated with a privilege level indicating the degree of access the CPU has to system resources while running in that context. The most privileged CPU execution context is generally known as kernel mode and a less privileged CPU execution context is generally known as user mode.
Many modern CPUs implement a set of hardware features, collectively referred to as hardware-assisted virtualization, that facilitate the operation of hypervisors and their VMs. For example, hardware-assisted virtualization provides four CPU execution modes that correspond to four separate types of CPU execution contexts for running hypervisor software and guest (i.e., VM level) software respectively: (1) a privileged hypervisor mode for running hypervisor code in the most privileged CPU execution context (i.e., kernel mode), (2) an unprivileged hypervisor mode for running hypervisor code in a less privileged CPU execution context (i.e., user mode), (3) a privileged guest mode for running guest code in a context where it perceives itself to have full privileges but the hypervisor remains isolated/protected, and (4) an unprivileged guest mode for running guest code in user mode.
Hardware-assisted virtualization also provides a control structure, shown within CPU 102 of
As mentioned previously, existing hypervisors generally run entirely in the most privileged CPU execution context, which is privileged hypervisor mode on CPUs with hardware-assisted virtualization. However, for reasons such as security, software simplification, and so on, it has become increasingly desirable to move at least a portion of the functionality of hypervisors to unprivileged hypervisor mode. For example,
An issue with the hypervisor design shown in
Another issue is that, with existing CPU architectures, any CPU transition from a hypervisor mode to a guest mode must be initiated from privileged hypervisor mode; this transition cannot be performed directly from unprivileged hypervisor mode. Similarly, any CPU return (i.e., exit) from a guest mode to a hypervisor mode must go to privileged hypervisor mode. As a result, if hypervisor 200 is running in unprivileged hypervisor mode via user level hypervisor component 204 at the time of needing to switch CPU control to either privileged guest mode or unprivileged guest mode, CPU 102 must first transition to privileged hypervisor mode via kernel level hypervisor component 202 and then transition to privileged/unprivileged guest mode. Further, upon exiting the guest mode, the only option is to return to privileged hypervisor mode. Accordingly, if the intention is to return to unprivileged hypervisor mode, yet another transition is required from privileged hypervisor mode to unprivileged hypervisor. This is illustrated as scenario 310 in
To address the foregoing,
The foregoing new hardware-assisted virtualization features advantageously reduce the number of CPU execution context transitions across privilege levels that are necessitated by user level hypervisor component 204 of hypervisor 200, thereby allowing the hypervisor to reap the benefits of executing certain tasks at the user level (e.g., improved security, simplified architecture, etc.) while at the same time maintaining a high level of performance. For example,
And
In one set of embodiments, kernel level hypervisor component 202 can enable new features (1)-(3) and configure new control structures 404 and 406 upon startup of hypervisor 200. In alternative embodiments, kernel level hypervisor component 202 can dynamically enable/disable features (1)-(3) and/or dynamically configure/reconfigure control structures 404 and 406 during the runtime of hypervisor 200.
The remaining sections of this disclosure describe workflows that may be implemented by kernel level hypervisor component 202 and user level hypervisor component 204 using the new hardware-assisted virtualization features of enhanced CPU 402 to enable user level hypervisor access to existing control structure 108 and to perform direct unprivileged hypervisor mode to guest mode transitions according to certain embodiments. It should be appreciated that
Starting with step 602, kernel level hypervisor component 202 can modify control structure 408 to enable the new hardware-assisted virtualization features. In addition, at step 604, kernel level hypervisor component 202 can modify control structure 404 to mark certain properties/settings as being accessible from unprivileged hypervisor mode.
At a later time, user level hypervisor component 204 can send a request (e.g., invoke an instruction) to CPU 102 to read or write a particular property/setting in control structure 108 (step 606). In response, CPU 102 can check control structure 404 to determine whether the read or write operation for that property/setting is allowed (step 608).
If the answer is yes, CPU 102 can proceed with execution the read/write operation on control structure 108 and return an appropriate response to user level hypervisor component 204 (step 610). For example, in the case of a read request, CPU 102 can return the current value of the property/setting in control structure 108. In the case of a write request, CPU 102 can return an acknowledgement indicating that the write operation has been completed successfully. However, if the answer at step 608 is no, CPU 102 can generate and return an error message (e.g., exception) to user level hypervisor component 204 indicating that the requested operation is not allowed (step 612).
Starting with step 702, kernel level hypervisor component 202 can modify control structure 408 to enable the new hardware-assisted virtualization features. In addition, at step 704, kernel level hypervisor component 202 can modify control structure 406 to mark certain guest events/operations that may occur while CPU 102 is in privileged or unprivileged guest mode as causing an exit/transition from that guest mode directly to unprivileged hypervisor mode. As part of 704, kernel level hypervisor component 202 can set within control structure 406, for each marked guest event/operation, a program address of user level hypervisor component 204 to return to.
At a later time, user level hypervisor component 204 can encounter a scenario in which it would like to transition CPU control to one of the guest modes and thus can send a request to CPU 102 switch to that guest mode (step 706). In certain embodiments, step 706 can comprise invoking a new CPU instruction that specifies this transition (i.e., from unprivileged hypervisor mode to one of the guest modes). In response, CPU 102 can transition directly to the requested guest mode and run appropriate guest code in that execution mode (step 708).
At step 710, CPU 102 can detect, while running in the guest mode, the occurrence of a guest event or operation that is marked in control structure 406 and thus indicates that a transition should occur to unprivileged hypervisor mode. Finally, at step 712, CPU 102 can transition directly from the guest mode to unprivileged hypervisor mode and restart execution at the program address of user level hypervisor component 204 specified for that guest event/operation in control structure 406.
Certain embodiments described herein can employ various computer-implemented operations involving data stored in computer systems. For example, these operations can require physical manipulation of physical quantities—usually, though not necessarily, these quantities take the form of electrical or magnetic signals, where they (or representations of them) are capable of being stored, transferred, combined, compared, or otherwise manipulated. Such manipulations are often referred to in terms such as producing, identifying, determining, comparing, etc. Any operations described herein that form part of one or more embodiments can be useful machine operations.
Further, one or more embodiments can relate to a device or an apparatus for performing the foregoing operations. The apparatus can be specially constructed for specific required purposes, or it can be a generic computer system comprising one or more general purpose processors (e.g., Intel or AMD x86 processors) selectively activated or configured by program code stored in the computer system. In particular, various generic computer systems may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations. The various embodiments described herein can be practiced with other computer system configurations including handheld devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
Yet further, one or more embodiments can be implemented as one or more computer programs or as one or more computer program modules embodied in one or more non-transitory computer readable storage media. The term non-transitory computer readable storage medium refers to any storage device, based on any existing or subsequently developed technology, that can store data and/or computer programs in a non-transitory state for access by a computer system. Examples of non-transitory computer readable media include a hard drive, network attached storage (NAS), read-only memory, random-access memory, flash-based nonvolatile memory (e.g., a flash memory card or a solid state disk), persistent memory, NVMe device, a CD (Compact Disc) (e.g., CD-ROM, CD-R, CD-RW, etc.), a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The non-transitory computer readable media can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
In addition, while certain virtualization methods referenced herein have generally assumed that virtual machines present interfaces consistent with a particular hardware system, persons of ordinary skill in the art will recognize that the methods referenced can be used in conjunction with virtualizations that do not correspond directly to any particular hardware system. Virtualization systems in accordance with the various embodiments, implemented as hosted embodiments, non-hosted embodiments or as embodiments that tend to blur distinctions between the two, are all envisioned. Furthermore, certain virtualization operations can be wholly or partially implemented in hardware.
Many variations, modifications, additions, and improvements are possible, regardless the degree of virtualization. The virtualization software can therefore include components of a host, console, or guest operating system that performs virtualization functions. Plural instances can be provided for components, operations, or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the present disclosure. In general, structures and functionality presented as separate components in exemplary configurations can be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component can be implemented as separate components.
As used in the description herein and throughout the claims that follow, “a,” “an,” and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
The above description illustrates various embodiments along with examples of how aspects of particular embodiments may be implemented. These examples and embodiments should not be deemed to be the only embodiments and are presented to illustrate the flexibility and advantages of particular embodiments as defined by the following claims. Other arrangements, embodiments, implementations, and equivalents can be employed without departing from the scope hereof as defined by the claims.