1. Field
This disclosure concerns architectures and methods for implementing memory management in a virtualization environment.
2. Background
A computing system utilizes memory to hold data that the computing system uses to perform its processing, such as instruction data or computation data. The memory is usually implemented with semiconductor devices organized into memory cells, which are associated with and accessed using a memory address. The memory device itself is often referred to as “physical memory” and addresses within the physical memory are referred to as “physical addresses” or “physical memory addresses”.
Many computing systems also use the concept of “virtual memory”, which is memory that is logically allocated to an application on a computing system. The virtual memory corresponds to a “virtual address” or “logical address” which maps to a physical address within the physical memory. This allows the computing system to de-couple the physical memory from the memory that an application thinks it is accessing. The virtual memory is usually allocated at the software level, e.g., by an operating system (OS) that takes responsibility for determining the specific physical address within the physical memory that correlates to the virtual address of the virtual memory. A memory management unit (MMU) is the component that is implemented within a processor, processor core, or central processing unit (CPU) to handle accesses to the memory. One of the primary functions of many MMUs is to perform translations of virtual addresses to physical addresses.
Modern computing systems may also implement memory usage in the context of virtualization environments. A virtualization environment contains one or more “virtual machines” or a “VMs”, which are software-based implementation of a machine in a virtualization environment in which the hardware resources of a real “host” computer (or “root” computer where these terms are used interchangeably herein) are virtualized or transformed into the underlying support for the fully functional “guest” virtual machine that can run its own operating system and applications on the underlying physical resources just like a real computer. By encapsulating an entire machine, including CPU, memory, operating system, storage devices, and network devices, a virtual machine is completely compatible with most standard operating systems, applications, and device drivers. Virtualization allows one to run multiple virtual machines on a single physical machine, with each virtual machine sharing the resources of that one physical computer across multiple environments. Different virtual machines can run different operating systems and multiple applications on the same physical computer.
One reason for the broad adoption of virtualization in modern business and computing environments is because of the resource utilization advantages provided by virtual machines. Without virtualization, if a physical machine is limited to a single dedicated operating system, then during periods of inactivity by the dedicated operating system the physical machine is not utilized to perform useful work. This is wasteful and inefficient if there are users on other physical machines which are currently waiting for computing resources. To address this problem, virtualization allows multiple VMs to share the underlying physical resources so that during periods of inactivity by one VM, other VMs can take advantage of the resource availability to process workloads. This can produce great efficiencies for the utilization of physical devices, and can result in reduced redundancies and better resource cost management.
Memory is one type of a physical resource that can be managed and utilized in a virtualization environment. A virtual machine that implements a guest operating system may allocate its own virtual memory (“guest virtual memory”) which corresponds to a virtual address (“guest virtual address” or “GVA”) allocated by the guest operating system. Since the guest virtual memory is being allocated in the context of a virtual machine, the OS will relate the GVA to what it believes to be an actual physical address, but which is in fact just virtualized physical memory on the virtualized hardware of the virtual machine. This virtual physical address is often referred to as a “guest physical address” or “GPA”. The guest physical address can then be mapped to the underlying physical memory within the host system, such that a guest physical address maps to host physical address.
As is evident from the previous paragraph, each memory access in a virtualization environment may therefore correspond to at least two levels of indirection. A first level of indirection exists between the guest virtual address and the guest physical address. A second level of indirection exists between the guest physical address and the host physical address.
Conventionally, multiple translation procedures are separately performed to implement each of these two levels of indirection for the memory access in a virtualization environment. Therefore, a MMU in a virtualization environment would perform a first translation procedure to translate the guest virtual address into the guest physical address. The MMU would then perform a second translation procedure to translate the guest physical address into the host physical address.
The issue with this multi-stage translation approach is that each translation procedure is typically expensive to perform, e.g., in terms of time costs, computation costs, and memory access costs.
Therefore, there is a need for an improved approach to implement memory management which can more efficiently perform memory access in a virtualization environment.
The following presents a simplified summary of some embodiments in order to provide a basic understanding of the invention. This summary is not an extensive overview and is not intended to identify key/critical elements or to delineate the scope of the claims. Its sole purpose is to present some embodiments in a simplified form as a prelude to the more detailed description that is presented below.
The present disclosure describes an architecture and method for performing memory management in a virtualization environment. According to some embodiments, multiple levels of virtualization-specific caches are provided to perform address translations, where at least one of the virtualization-specific caches contains a mapping between a guest virtual address and a host physical address. This type of caching implementation serves to minimize the need to perform costly multi-stage translations in a virtualization environment. In some embodiments, a micro translation lookaside buffer (uTLB) is used to provide a mapping between a guest virtual address and a host physical address. For address mapping that are cached in the uTLB, this approach avoids multiple address translations to obtain a host physical address from a guest virtual address.
Also described is an approach to implement a lookup structure that includes a content addressable memory (CAM) which is associated with multiple memory components. The CAM provides one or more pointers into the plurality of downstream memory structures. In some embodiments, a TLB for caching address translation mappings is embodied as a combination of a CAM associated with parallel downstream memory structures, where a first memory structure corresponds to a host address mappings and the second memory structure corresponds to guest address mappings.
Further details of aspects, objects, and advantages of various embodiments are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the disclosure.
The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the relevant art to make and use the invention.
The present invention will now be described with reference to the accompanying drawings. In the drawings, generally, like reference numbers indicate identical or functionally similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
This disclosure describes improved approaches to perform memory management in a virtualization environment. According to some embodiments, multiple levels of caches are provided to perform address translations, where at least one of the caches contains a mapping between a guest virtual address and a host physical address. This type of caching implementation serves to minimize the need to perform costly multi-stage translations in a virtualization environment.
A virtual machine that implements a guest operating system will attempt to access guest virtual memory using the guest virtual address 102. One or more memory structures 110 may be employed to maintain information that relates the guest virtual address 102 to the guest physical address 104. Therefore, a first translation procedure is performed to access the GVA to GPA memory structure(s) 110 to translate the guest virtual address 102 to the guest physical address 104.
Once the guest physical address 104 has been obtained, a second translation procedure is performed to translate the guest physical address 104 into the host physical address 106. Another set of one or more memory structures 112 may be employed to maintain information that relates the guest physical address 104 to the host physical address 106. The second translation procedure is performed to access the GPA to HPA memory structure(s) 112 to translate the guest physical address 104 to the host physical address 106.
As previously noted, the issue with this multi-stage translation approach is that each translation procedure may be relatively expensive to perform. If the translation data is not cached, then one or more page tables would need to be loaded and processed to handle each address translation for each of the two translation stages. Even if the translation data is cached in TLBs, multiple TLB accesses are needed to handle the two stages of the address translation, since a first TLB is accessed for the GVA to GPA translation and a second TLB is accessed for the GPA to HPA translation.
Virtualization works by inserting a thin layer of software on the computer hardware or on a host operating system, which contains a virtual machine monitor or “hypervisor” 216. The hypervisor 216 transparently allocates and manages the underlying resources within the host physical machine 220 on behalf of the virtual machine 202. In this way, applications on the virtual machine 202 are completely insulated from the underlying real resources in the host physical machine 220. Virtualization allows multiple virtual machines 202 to run on a single host physical machine 220, with each virtual machine 202 sharing the resources of that host physical machine 220. The different virtual machines 202 can run different operating systems and multiple applications on the same host physical machine 220. This means that multiple applications on multiple virtual machines 202 may be concurrently running and sharing the same underlying set of memory within the host physical memory 228.
In the system of
Within the host processor 222 of host machine 220, the multiple levels of caching is implemented with a first level of caching provided by a micro-TLB 226 (“uTLB”) and a second level of caching provided by a memory management unit (“MMU”) 224. The first level of caching provided by the uTLB 226 provides a direct mapping between a guest virtual address and a host physical address. If the necessary mapping is not found in the uTLB 226 (or the mapping exists in uTLB 226 but is invalid), then a second level of caching provided by the MMU 224 can be used to perform multi-stage translations of the address data.
The MMU 224 includes multiple lookup structures to handle the multiple address translations that can be performed to obtain a host physical address (address output 322) from an address input 320. In particular, the MMU 224 includes a guest TLB 304 to provide a translation of an address input 320 in the form of a guest virtual address to a guest physical address. The MMU also includes a root TLB 306 to provide address translations to host physical addresses. In the virtualization context, the input to the root TLB 306 is a guest physical address that is mapped within the root TLB 306 to a host physical address. In the non-virtualization context, the address input 320 is an ordinary virtual address that bypasses the guest TLB 304 (via mux 330), and which is mapped within the root TLB 306 to its corresponding host physical address.
In general, a TLB is used to reduce virtual address translation time, and is often implemented as a table in a processor's memory that contains information about the pages in memory that have been recently accessed. Therefore, the TLB functions as a cache to enable faster computing because it caches a mapping between a first address and a second address. In the virtualization context, the guest TLB 304 caches mappings between guest virtual addresses and guest physical addresses, while the root TLB 306 caches mappings between guest physical addresses and host physical addresses.
If a given memory access request from an application does not correspond to mappings cached within the guest TLB 304 and/or root TLB 306, then this cache miss/exception will require much more expensive operations by a page walker to access page table entries within one or more page tables to perform address translations. However, once the page walker has performed the address translation, the translation data can be stored within the guest TLB 304 and/or the root TLB 306 to cache the address translation mappings for a subsequent memory access for the same address values.
While the cached data within combination of the guest TLB 304 and the root TLB 306 in the MMU 224 provides a certain level of performance improvement, at least two lookup operations (a first lookup in the guest TLB 304 and a second lookup in the root TLB 306) are still required with these structures to perform a full translation from a guest virtual address to a host physical address.
To provide even further processing efficiencies, the uTLB 226 provides a single caching mechanism that cross-references a guest virtual address with its corresponding absolute host physical addresses in the physical memory 228. The uTLB 226 enables faster computing because it allows translations between the guest virtual address to the host physical address translations to be performed with only a single lookup operation within the uTLB 226.
In effect, the uTLB 226 provides a very fast L1 cache for address translations between guest virtual addresses and host physical addresses. The combination of the guest TLB 304 and the root TLB 306 in the MMU 224 therefore provides a (less efficient) L2 cache that can nevertheless still be used to provide the desired address translation if the required mapping data is not in the L1 cache (uTLB 226). If the desired mapping data is not in either the L1 and L2 caches, then the less efficient page walker is employed to obtain the desired translation data, which is then used to populate either or both the L1 (uTLB 226) and L2 caches (guest TLB 304 and root TLB 306).
At 404, a check is made whether a mapping exists for that guest virtual address within the L1 cache. The uTLB in the L1 cache includes one or more memory structures to maintain address translation mappings for guest virtual addresses, such as table structures in a memory device to map between different addresses. A lookup is performed within the uTLB to determine whether the desired mapping is currently cached within the uTLB.
Even if a mapping does exist within the uTLB for the guest virtual address, under certain circumstances it is possible that the existing mapping within the uTLB is invalid and should not be used tor the address translation. For example, as described in more detail below, since the address transactions were last cached in the uTLB the memory region of interest may have changed from being mapped memory to unmapped memory. This change in status of the cached translation data for the memory region would render the previously cached data in the uTLB invalid.
Therefore, at 406, cached data for guest virtual addresses within the uTLB are checked to determine whether they are still valid. If the cached translation data is still valid, then at 408, the data within the L1 cache of the uTLB is used to perform the address translation from the guest virtual address to the host physical address. Thereafter, at 410, the host physical address is provided to perform the desired memory access.
If the guest virtual address mapping is not found in the L1 uTLB cache, or is found in the uTLB but the mapping is no longer valid, then the L2 cache is checked for the appropriate address translations. At 410, a lookup is performed within a guest TLB to perform a translation from the guest virtual address to a guest physical address. If the desired mapping data is not found in the guest TLB, then a page walker (e.g., a hardware page walker) is employed to perform the translation and to then store the mapping data in the guest TLB.
Once the guest physical address is identified, another lookup is performed at 412 within a root TLB to perform a translation from the guest physical address to a host physical address. If the desired mapping data is not found in the root TLB, then a page walker is employed to perform the translation between the GPA and the HPA, and to then store the mapping data in the root TLB.
At 414, the mapping data from the L2 cache (guest TLB and root TLB) is stored into the L1 cache (uTLB). This is to store the mapping data within the L1 cache so that the next time software on the virtual machine needs to access memory at the same guest virtual address, only a single lookup is needed (within the uTLB) to perform the necessary address translation for the memory access. Thereafter, at 410, the host physical address is provided for memory access.
Assume that uTLB 226 either does not contain an address mapping for the guest virtual address 102, or does contain an address mapping which is no longer valid. In this case, the procedure is to check for the required mappings within the L2 cache in the MMU 224. In particular, as shown in
This is illustrated starting with
The uTLB 226 may be implemented using any suitable TLB architecture.
The uTLB 226 of
The CAM 602 stores mappings between address inputs and entries within the root data array 604 and the guest data array 606. The root data array 604 stores mappings to host physical addresses. The guest data array 606 stores mappings to guest physical addresses. In operation, The CAM 602 receives inputs in the form of addresses. In a virtualization context, the CAM 602 may receive a guest virtual address as an input. The CAM 602 provides a pointer output that identifies the entries within the root data array 604 and the guest data array 606 for a guest virtual address of interest.
In accordance with a further embodiment,
The uTLB 226 is a subset of MMU 306, in accordance with a further embodiment of the present invention. Therefore, a valid entry in the uTLB 226 must exist in MMU 306. Conversely, if an entry does not exist in MMU 224, then it cannot exist in the uTLB 226. As a result, if either half of the translation is removed from the MMU 224, then the full translation in the uTLB 226 also needs to be removed. If the GVA to GPA translation is removed from guest TLB 304, then the MMU instructs the uTLB 226 to CAM on the GVA in the CAM array 602. If a match is found, then the matching entry is invalidated, in accordance with an embodiment of the present invention. Likewise, if the GPA to RPA translation is removed from the root TLB 306, then the MMU instructs the uTLB 226 to CAM on the GPA in the GPA CAM Array 608.
Moreover, since uTLB 226 includes both Root (RVA to RPA) and Guest (GVA to RPA) translations, additional information is included in the uTLB to disambiguate between the two contexts, in accordance with an embodiment of the present invention. This information includes, by way of non-limiting example, the Guest-ID field shown in
One skilled in the relevant arts will appreciate that while the techniques described herein can be utilized to improve the performance of GVA to RPA translations, they remain capable of handling RVA to RPA translations as well. In accordance with an embodiment of the present invention, the structure provided to improve the performance of GVA to RPA translations is usable to perform RVA to RPA translations without further modification.
Of particular interest is the “Unmap” data field 704 in the guest data array structure 702 of
To explain, consider a system implementation that permits a memory region to be designated as definitively being “mapped”, “unmapped”, or either “mapped/unmapped”, A region that is definitively mapped corresponds to virtual addresses that require translation to a physical address. A region that is definitively unmapped corresponds to addresses that will bypass the translation since the address input is the actual physical address. A region that can be either mapped or unmapped creates the possibility of a dynamic change in the status of that memory region to change from being mapped to unmapped, or vice versa.
This means that a guest virtual address corresponds to a first physical address in a mapped mode, but that same guest virtual address may correspond to an entirely different second physical address in an unmapped mode. Since the memory may dynamically change from being mapped to unmapped, and vice versa, cached mappings may become incorrect after a dynamic change in the mapped/unmapped status of a memory region. In a system that supports these types of memory regions, the memory management mechanism for the host processor should be robust enough to be able to handle such dynamic changes in the mapped/unmapped status of memory regions.
If the memory management mechanism only supports a single level of caching, then this scenario does not present a problem since a mapped mode will result in a lookup of the requisite TLB while the unmapped mode will merely cause a bypass of the TLB. However, when multiple levels of caching are provided, then additional actions are needed to address the possibility of a dynamic change in the mapped/unmapped status of a memory region.
In some embodiments, a data field in the guest data array 606 is configured to change if there is a change in the mapped/unmapped status of the corresponding memory region. For example, if the array structure 702 of
At 804, the CAM 602 is checked to determine whether a mapping exists for the guest virtual address within the L1 (uTLB) cache. If the CAM does not include an entry for the guest virtual address, then this means that the L1 cache does not include a mapping for that address. Therefore, the L2 cache is checked for the appropriate address translations. At 810, a lookup is performed within a guest TLB to perform a translation from the guest virtual address to a guest physical address. If the desired mapping data is not found in the guest TLB, then a page walker (e.g., a hardware page walker) is employed to perform the translation and to then store the mapping data in the guest TLB.
Once the guest physical address is identified, another lookup is performed at 812 within a root TLB to perform a translation from the guest physical address to a host physical address. If the desired mapping data is not found in the root TLB, then a page walker is employed to perform the translation between the GPA and the HPA, and to then store the mapping data in the root TLB.
At 814, the mapping data from the L2 cache (guest TLB and root TLB) is stored into the L1 cache (uTLB). This is to store the mapping data within the L1 cache so that the next time software on the virtual machine needs to access memory at the same guest virtual address, only a single lookup is needed (within the uTLB) to perform the necessary address translation or the memory access. In particular, mapping data from the root TLB is stored into the root data array 604 and mapping data from the guest TLB is stored into the guest data array 606.
One important item of information that is stored is the current mapped/unmapped status of the memory region of interest. The Unmap bit 704 in the guest data array structure 702 is set to indicate whether the memory region is mapped or unmapped.
The next time that a memory access results in the same guest virtual address being received at 802, then the check at 804 will result in an indication that a mapping exists in the L1 cache for the guest virtual address. However, it is possible that the mapped/unmapped status of the memory region of interest may have changed since the mapping information was cached, e.g., from being mapped to unmapped or vice versa.
At 805, a checking operation is performed to determine whether the mapped/unmapped status of the memory region has changed. This operation can be performed by comparing the current status of the memory region against the status bit in data field 704 of the cached mapping data. If there is a determination at 806 that the mapped/unmapped status of memory region has not changed, then at 808, the mapping data in the L1 cache is accessed to provide the necessary address translation for the desired memory access. If, however, there is a determination at 806 that the mapped/unmapped status of the memory region has changed, then the procedure will invalidate the cached mapping data within the L1 cache and will access the L2 cache to perform the necessary translations to obtain the physical address.
Therefore, what has been described is an improved approach for implementing a memory management mechanism in a virtualization environment. Multiple levels of caches are provided to perform address translations, where at least one of the caches contains a mapping between a guest virtual address and a host physical address. This type of caching implementation serves to minimize the need to perform costly multi-stage translations in a virtualization environment.
The present disclosure also describes an approach to implement a lookup structure that includes a content addressable memory (CAM) which is associated with multiple memory components. The CAM provides one or more pointers into the plurality of downstream memory structures. In some embodiments, a TLB for caching address translation mappings is embodied as a combination of a CAM associated with parallel downstream memory structures, where a first memory structure corresponds to a host address mappings and the second memory structure corresponds to guest address mappings.
While this invention has been described in terms of several preferred embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. Although various examples are provided herein, it is intended that these examples be illustrative and not limiting with respect to the invention. Further, the Abstract is provided herein for convenience and should not be employed to construe or limit the overall invention, which is expressed in the claims. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.