The present disclosure relates generally to memory management of guest virtual machines and, more particularly, to input-output memory management unit emulation and distribution between virtual devices. Virtualization may be used to provide some physical components as logical objects in order to allow running various software modules, for example, multiple operating systems, concurrently and in isolation from other software modules, on one or more interconnected physical computer systems. Virtualization allows, for example, consolidating multiple physical servers into one physical server running multiple guest virtual machines in order to improve the hardware utilization rate.
Virtualization may be achieved by running a software layer, often referred to as a hypervisor, above the hardware and below the guest virtual machines. A hypervisor may run directly on the server hardware without an operating system beneath it or as an application running on a traditional operating system. A hypervisor may virtualize the physical layer and provide interfaces between the underlying hardware and guest virtual machines. Processor virtualization may be implemented by the hypervisor scheduling time slots on one or more physical processors for a guest virtual machine, rather than a guest virtual machine actually having a dedicated physical processor. The present disclosure provides improved systems and methods for input-output memory management unit emulation in a virtual environment.
The present disclosure provides new and innovative methods and systems for input-output memory management unit emulation. For example, the method includes associating, by a management software, a plurality of devices with a plurality of input-output memory management units. Association includes associating a first device with a first input-output memory management unit having a first security designation. The first device is at least one of a first PCI device and a first PCI bridge. Association further includes associating a second device with a second input-output memory management unit having a second security designation that is different from the first security designation. The second device is at least one of a second PCI device and a second PCI bridge. The hypervisor constructs a table that describes associations between the plurality of devices and the plurality of input-output memory management units. The hypervisor provides the table to a guest virtual machine having a plurality of guest addresses including a first guest address and a second guest address. The first device accesses the first guest address through the first input-output memory management unit, and the second device accesses the second guest address through the second input-output memory management unit.
Additional features and advantages of the disclosed methods and system are described in, and will be apparent from, the following Detailed Description and the Figures.
In computer systems executing a guest virtual machine, devices (e.g., peripheral component interconnect (PCI) devices) may access physical memory (e.g., direct memory access) associated with the guest virtual machine. Typically, an input-output memory management unit is enabled to facilitate access between devices and the guest virtual machine. Input-output memory management units may be used in other ways besides access facilitation. For example, input-output memory management units may be used to protect host memory (e.g., RAM) by supplying a mapping of processes used by devices. Also, for example, input-output memory management units may be used to re-map addresses for devices supporting only 32-bit access over 4 GiB memory. The input-output memory management unit has the capability to keep devices from accessing the physical address of particular physical memory (e.g., guest virtual machine protection). Rather, input-output memory management units may provide devices with virtual addresses. The input-output memory management unit may translate the virtual address, accessed by devices, into the physical address of particular physical memory. In this way, if a device is malicious (e.g., a corrupt device) it is unable to access physical memory of the guest virtual machine (e.g., RAM) and subsequently cause problems with the guest virtual machine and/or the host physical memory. For example, the input-output memory management unit may ensure that the guest kernel and guest virtual machine are protected against malicious hardware or buggy drivers.
This translation by the input-output memory management unit, while protecting the guest virtual machine, takes additional processing time. For example, input-output memory management unit implementation and translation may lead to a 15-30% performance penalty for all direct memory access capable PCI devices in the guest virtual machine. Additionally, security concerns are not equal for every device that is attempting to access the guest virtual machine. For example, some devices may be trusted more than other devices. Thus, the scope of guest virtual machine protection should vary. The trusted devices can receive less security (e.g., minimal or none). By having individual (e.g., fine grained) device scopes for input-output memory management units, multiple devices may access a guest virtual machine while optimizing both security and processing efficiency. The present disclosure describes systems and methods of emulating input-output memory management units, such that input-output memory management units may advantageously vary in their security levels on a device-to-device basis.
As used herein, physical processor or processors 120A-C refer to a device capable of executing instructions encoding arithmetic, logical, and/or I/O operations. In one illustrative example, a processor may follow Von Neumann architectural model and may include an arithmetic logic unit (ALU), a control unit, and a plurality of registers. In an example embodiment, a processor may be a single core processor which is typically capable of executing one instruction at a time (or process a single pipeline of instructions), or a multi-core processor which may simultaneously execute multiple instructions. In another example embodiment, a processor may be implemented as a single integrated circuit, two or more integrated circuits, or may be a component of a multi-chip module (e.g., in which individual microprocessor dies are included in a single integrated circuit package and hence share a single socket). A processor may also be referred to as a central processing unit (CPU).
As discussed herein, a memory device 130A-C refers to a volatile or non-volatile memory device, such as RAM, ROM, EEPROM, or any other device capable of storing data. As discussed herein, input/output device 140A-B refers to a device capable of providing an interface between one or more processors and an external device. The external device's operation is based on the processor inputting and/or outputting data.
Processors 120A-C may be interconnected using a variety of techniques, ranging from a point-to-point processor interconnect, to a system area network, such as an Ethernet-based network. Local connections within each node 110A-D, including the connections between a processor 120A and a memory device 130A-B and between a processor 120A and an I/O device 140A, may be provided by one or more local buses of suitable architecture, for example, peripheral component interconnect (PCI). As used herein, a device of the host operating system (host OS) 186 (or host device) may refer to CPU 120A-C, MD 130A-C, I/O 140A-B, a software device, and/or hardware device 150A-B.
As noted above, computer system 100 may run multiple guest virtual machines (e.g., VM 170A-B), by executing a software layer (e.g., hypervisor 180) above the hardware and below the guest virtual machines 170A-B, as schematically shown in
In an example embodiment, a guest virtual machine 170A-B may execute a guest operating system (guest OS) 196A-B which may utilize the underlying VCPU 190A-B, VMD 192A-B, and VI/O devices 194A-B. One or more applications 198A-D may be running on a guest virtual machine 170A-B under the guest operating system 196A-B. In an example embodiment, a guest virtual machine 170A-B may include multiple virtual processors 190A-B. Processor virtualization may be implemented by the hypervisor 180 scheduling time slots on one or more physical processors 120A-C such that from the guest operating system's perspective those time slots are scheduled on a virtual processor 190A-B.
The hypervisor 180 controls and limits access to memory (e.g., memory allocated to the guest virtual machines 170A-B and memory allocated to the guest operating systems 196A-B, such as guest memory 195A-B provided to guest operating systems 196A-B, etc.). For example, guest memory 195A-B may be divided into a plurality of memory pages. Access to these memory pages is controlled and limited by the hypervisor 180. Likewise, for example, guest memory 195A-B allocated to the guest operating system 196A-B are mapped from host memory 184 such that when a guest application 198A-D or a device (e.g., device 150A) uses or accesses a memory page of guest memory 195A-B it is actually using or accessing host memory 184. Host memory 184 is also referred to as host physical memory 184, as it physically exists on a computer system (e.g., system 100).
The hypervisor 180 may keep track of how each memory page is mapped, allocated, accessed, and/or used through the use of the input-output memory management unit (IOMMU) 188. An IOMMU may map virtual addresses to physical addresses. In an example embodiment, an IOMMU may be implemented on a host in physical hardware. Also, in an example embodiment, an IOMMU may be emulated on a guest virtual machine, in which case, the emulated IOMMU may map guest virtual addresses to guest physical addresses. For example, the IOMMU 188 maps the device address space (e.g., a bus address) that is relevant to the I/O bus into the physical address space (e.g., a host physical address). The IOMMU 188 may also include extra information associated with the address space mapping, such as read and write permissions. For example, mappings in the IOMMU 188 allow a device (e.g., device 150A) to access a particular address (e.g., a physical address or a virtual address). In an example embodiment, the particular address is a guest address (e.g., guest address, guest virtual address, etc.). In a different example embodiment, the particular address is a physical address (e.g., host physical address, guest physical address, etc.). Likewise, for example, mappings can be removed to prevent direct access, by the device, to the particular address. The mechanism of mapping and unmapping an address allows a host, through a hypervisor 180, to control access to a particular host address in host memory 184. For example, the IOMMU 188 may implement various levels of security (e.g., standard translation, one-to-one translation, disabled translation, etc.) for particular devices (e.g., device 150A-B). As a result, the host can maintain memory integrity by preventing a device from performing illegal transactions or accessing invalid addresses.
In this manner, the hypervisor 180, through the IOMMU 188, can prevent memory allocated to one guest OS 196A from being inappropriately accessed and/or modified by another guest OS 196B or the host OS 186. Accesses are detected by the guest OS (e.g., 196A) in the guest virtual machine (e.g., VM 170A), which may act as an interface between a host OS (e.g., 186) and the guest OS (e.g., 196A). Similarly, the hypervisor 180, through the IOMMU 188, can prevent memory assigned to or being used by one application 198A from being used by another application 198B. Additionally, the hypervisor 180, through the IOMMU 188, can prevent memory accessible by one node (e.g., 110A) from being used by another node (e.g., 110B). In an example embodiment, the IOMMU 188 is a hardware component that is separate from the VMs 170A-B, nodes 110A-D, the host OS 186, and the hypervisor 180. In a different example embodiment, the IOMMU 188 is emulated, such that it is included within the host OS 186 and/or the VMs 170A-B, and used by the guest OS 196A-B to communicate with the hypervisor 180. For example, VM 170A may include emulated IOMMUs, also referred to as VIOMMUs 171A-173A. Also, for example, VM 170B may include VIOMMUs 171B-173B. In an example embodiment, VIOMMUs are pure stand-alone software constructs that may operate with no interaction with the IOMMU 188, or operate on a system without a host physical IOMMU.
Mappings to memory, stored in the IOMMU 188, are accessible by the hypervisor 180, the VMs 170A-B and each node 110A-D. Through these mappings, the memory itself can be accessed. Likewise, mappings may be used together with any paging data structure used by the VMs 170A-B to support translation from guest OS 196A-B to host OS 186 addresses (e.g., 32-bit linear address space using a two-level hierarchical paging structure, Physical Address Extension mode, INTEL® Extended Memory 64 Technology mode, etc.).
In an example embodiment, virtual machines (e.g., VM 170A) may further include a table 160A. The table 160A may describe associations between the plurality of devices (e.g., first device 150A and second device 150B) and a plurality of emulated IOMMUs or VIOMMUs. For example, the table 160A may include associations with a plurality of VIOMMUs such as a first VIOMMU 171A, a second VIOMMU 172A, a third VIOMMU 173A, a fourth VIOMMU 171B, a fifth VIOMMU 172B, a sixth VIOMMU 173B, etc. In an example embodiment, emulation involves distributing several IOMMUs (e.g., VIOMMUs) between several devices (e.g., devices 150A-B, virtual devices VI/O 194A-B, etc.) such that each distributed VIOMMU may have single-device scope. Emulation of IOMMUs is discussed in greater detail below with reference to
The table 160A may further include information on each of the VIOMMUs (e.g., the first VIOMMU 171A, the second VIOMMU 172A, the third VIOMMU 173A, etc.). In an example embodiment, the table 160A may reside on other virtual machines (e.g., VM 170B). In an example embodiment, the table 160A is an advanced configuration and power interface table (ACPI table). The table 160A, and its relation to the hypervisor 180, virtual machine 170A, and additional components, is discussed in greater detail below with reference to
The management software 200 may also provide access to one or more host operating systems including host OS 186 (as described above). The management software system 200 may provide access to additional host OSs 286A-C. Each of the host OS 186 and additional host OSs 286A-C may be connected to the network 210. Each of the host OS 186 and the additional host OSs 286A-C may be associated with a hypervisor. For example, host OS 186 may be associated with hypervisor 180 (as described above). Each of the additional host OSs 286A-C may be associated with hypervisors 280A-C respectively.
In an example embodiment, a user may interact with the management software system 200 to select at least one guest virtual machine (e.g., VM 170A) and at least one host operating system (e.g., host OS 186 and hypervisor 180) to emulate a virtual environment (e.g., computer system 100). In a different example embodiment, the user may interact with the management software system 200 to select more than one guest virtual machine (e.g., VM 170A-B) and at least one host operating system (e.g., host OS 186 and hypervisor 180) to emulate a virtual environment.
The management software 200 may also provide access to one or more devices including devices 150A-B (as described above) and at least one additional device 250A. Each of the devices 150A-B and additional device 250A may be connected to the network 210. Each of the devices 150A-B and additional device 250A may be selected, by the user, and subsequently associated with one or more guest virtual machines including VMs 170A-B and additional VMs 270A-B and one of host OS 186 and additional host OSs 286A-C. Thus, particular devices (e.g., device 150A and device 150B) are associated with a particular virtual environment (e.g., guest virtual machine 170A, host OS 186, and hypervisor 180). Once the virtual environment has been emulated, the IOMMUs for the virtual environment may be emulated (e.g., first VIOMMU 171A, second VIOMMU 172A, third VIOMMU 173A) and scoped on a device-by-device basis.
The example method 400 starts with a management software associating a plurality of devices with a plurality of input-output memory management units (block 405). For example, management software system 200 may associate devices 150A-B with first VIOMMU 171A and second VIOMMU 172A. This association includes associating a first device 150A with a first IOMMU having a first security designation (e.g., one-to-one translation) (block 406). In an example embodiment, the first device 150A is associated with the first VIOMMU 171A, which includes receiving a first device selection from a user. The first device selection is a selection of the first device 150A from the plurality of devices (e.g., devices 150A-B). Association further includes receiving a first IOMMU selection from the user, where the first IOMMU selection is a selection of the first VIOMMU 171A from the plurality of IOMMUs (e.g., IOMMUs 171A-173A). In an example embodiment, associating the first device 150A with the first VIOMMU 171A further includes receiving the first security designation defining a relationship between the first VIOMMU 171A and the first device 150A. In an example embodiment, the first device 150A is at least one of a first PCI device and a first PCI bridge. For example, a PCI device may typically be a peripheral component interconnect device that may require direct memory access. Also, for example, a PCI bridge may typically be provided as a software bridge to multiple PCI devices.
This association further includes associating a second device 150B with a second IOMMU having a second security designation that is different from the first security designation (block 407). For example, the second security designation may be standard translation and the first security designation may be one-to-one translation. In an example embodiment associating the second device 150B with the second VIOMMU 172A further includes receiving a second device selection from the user, where the second device selection is a selection of the second device 150B from the plurality of devices (e.g., devices 150A-B), and receiving a second IOMMU selection from the user, where second the input-output memory management unit selection is a selection of the second VIOMMU 172A from the plurality of VIOMMUs (e.g., VIOMMUs 171A-173A). In an example embodiment, associating the second device 150B with the second VIOMMU 172A further includes receiving the second security designation defining a relationship between the second VIOMMU 172A and the second device 150B. In an example embodiment, the second device 150B is at least one of a second PCI device and a second PCI bridge.
The method 400 further includes constructing, by a hypervisor 180, a table 160A (block 410). The table 160A describes associations between the plurality of devices 150A-B and the plurality of IOMMUs (e.g., VIOMMUs 171A-172A). The hypervisor 180 may then provide the table 160A to a guest virtual machine (e.g., first guest virtual machine 170A) having a plurality of guest addresses including a first guest address and a second guest address (block 415). For example, the first guest address may be 0100 and the second guest address may be 3FF0. In an example embodiment, the first guest address (e.g., 0100) may be a guest virtual address and the second guest address (e.g., 3FF0) may be a guest physical address. The first device 150A may access the first guest address through the first IOMMU (block 420). For example, the first VIOMMU 171A may have the first security designation of one-to-one translation, such that the first VIOMMU 171A links the first guest virtual address (e.g., address 0100) with a first guest physical address (e.g., 0100) at the same address location. Likewise, the second device 150B may access the second guest address through the second IOMMU (block 425). For example, the second VIOMMU 172A may have the second security designation of standard translation, such that the second VIOMMU 172A links the second guest virtual address (e.g., address 3FF0) with a second guest physical address (e.g., A7F0) at a different address location.
In the illustrated example embodiment in
A user 510 further sends, from the client device, a second device selection (e.g., a selection of second device 150B) to a management software system 200 (block 528). The user 510 sends, from the client device, a second IOMMU selection (e.g., a selection of second VIOMMU 172A) to the management software system 200 (block 530). The user 510 sends, from the client device, a second security designation (e.g., a selection of standard translation) to the management software system 200 (block 532). The management software system 200 associates the second device 150B with the second VIOMMU 172A (block 534).
In an example embodiment, the device selection (e.g., first device selection) and the IOMMU selection (e.g., first IOMMU selection) are entered, by the user 510, using the management software system 200 and/or a client device networked with the management software system 200. For example, the user 510 may enter selections through a command line generated by the management software system 200 and/or a client device networked with the management software system 200. Also, for example, the user 510 is prompted, by a graphical user interface of the management software system 200, to enter selections. The management software system may generate the graphical user interface, and may indicate the device selections (e.g., first device selection) and the IOMMU selections (e.g., first IOMMU selection) that the user 510 makes.
In an example embodiment, associating the plurality of devices with the plurality of input-output memory management units, as discussed above in relation to blocks 526 and 534, further includes associating a third device (e.g., device 250A) with a third VIOMMU (e.g., IOMMU 173A). The third device 250A is at least one of a third PCI device and a third PCI bridge. Likewise, in an example embodiment, associating the plurality of devices with the plurality of input-output memory management units further includes associating a fourth device with the third VIOMMU 173A. The fourth device is at least one of a third PCI device and a third PCI bridge. For example, both the third device 250A and the fourth device are associated with the third VIOMMU 173A. In another example embodiment, associating the plurality of devices with the plurality of input-output memory management units further includes associating a fifth device with none of the plurality of input-output memory management units. For example, the fifth device has no association with any of the plurality of input-output memory management units.
The hypervisor 180 constructs a table 160A (block 536). The table 160A describes associations between the plurality of devices (e.g., first device 150A and second device 150B) and the plurality of input-output memory management units (e.g., first VIOMMU 173A and second VIOMMU 172A). The table 160A may include additional information such as a first security designation and second security designation. In an example embodiment, the table 160A is an advanced configuration and power interface table (ACPI table). The hypervisor 180 provides the table 160A to a guest virtual machine 170A (block 538). The guest virtual machine 170A receives the table 160A (block 540). In an example embodiment, the hypervisor 180 is an open source machine emulator.
Once the table 160A has been received by the guest virtual machine 170A, access attempts (e.g., direct memory access attempts) by devices (e.g., first device 150A and second device 150B) will be channeled through respectively VIOMMUs (e.g., first VIOMMU 171A and second VIOMMU 172A).
More particularly, a first device 150A accesses a first guest address 551 (e.g., a first guest virtual address) on the first guest virtual machine 170A (block 542). In an example embodiment, the device accessing the first guest address 551 is a virtual device (e.g., VI/O 194A). Access to the first guest address 551 is made through the first VIOMMU 171A. In an example embodiment, the first VIOMMU 171A has a first security designation of one-to-one translation; thus, the first VIOMMU 171A performs one-to-one translation when the first device 150A accesses the first guest address 551. For example, when the first security designation is one-to-one translation, the first VIOMMU 171A links the first guest address 551 to a first physical address. The first guest address 551 and the first physical address have the same address location (e.g., 0100). For example, a first device 150A accesses the first physical address, which is linked to the first guest address 551. The first VIOMMU 171A provides the linked access, such that the first device 150A may access the first guest address 551 by accessing the first physical address. However, no address translation is necessary because the first guest address 551 and the first physical address have the same address location.
A second device 150B accesses a second guest address 552 (e.g., a second guest virtual address) on the first guest virtual machine 170A (block 544). In an example embodiment, the device accessing the first guest address 551 is a virtual device. Access to the second guest address 552 is made through the second VIOMMU 172A. In an example embodiment, the second VIOMMU 172A has a second security designation of standard translation; thus, the second VIOMMU 172A performs standard translation when the second device 150B accesses the second guest address 552. For example, when the second security designation is standard translation, the second VIOMMU 172A links the second guest address 552 to a second physical address. The second guest address 552 and the second physical address have different address locations (e.g., 3FF0 and A7F0). For example, a second device 150B accesses the second physical address. The second physical address (e.g., A7F0) is a different address location than the second guest address 552 (e.g., 3FF0). Thus, the second VIOMMU 172A must translate the second physical address into the second guest address 552, such that the second device 150B may access the second guest address 552. Accordingly, the presently described systems and methods of IOMMU emulation provide elasticity and flexibility as devices are associated with IOMMUs and security designations.
It will be appreciated that all of the disclosed methods and procedures described herein can be implemented using one or more computer programs or components. These components may be provided as a series of computer instructions on any conventional computer readable medium or machine readable medium, including volatile or non-volatile memory, such as RAM, ROM, flash memory, magnetic or optical disks, optical memory, or other storage media. The instructions may be provided as software or firmware, and/or may be implemented in whole or in part in hardware components such as ASICs, FPGAs, DSPs or any other similar devices. The instructions may be configured to be executed by one or more processors, which when executing the series of computer instructions, performs or facilitates the performance of all or part of the disclosed methods and procedures.
It should be understood that various changes and modifications to the example embodiments described herein will be apparent to those skilled in the art. Such changes and modifications can be made without departing from the spirit and scope of the present subject matter and without diminishing its intended advantages. It is therefore intended that such changes and modifications be covered by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
6401140 | Wu | Jun 2002 | B1 |
6704831 | Avery | Mar 2004 | B1 |
7882330 | Haertel | Feb 2011 | B2 |
7904692 | Mukherjee et al. | Mar 2011 | B2 |
8386745 | Kegel et al. | Feb 2013 | B2 |
8631170 | Tsirkin et al. | Jan 2014 | B2 |
8954959 | Tsirkin et al. | Feb 2015 | B2 |
9436495 | Tsirkin | Sep 2016 | B2 |
20030115394 | Kulchytskyy | Jun 2003 | A1 |
20050257246 | Adams | Nov 2005 | A1 |
20060117123 | Izumida | Jun 2006 | A1 |
20070043882 | Natarajan | Feb 2007 | A1 |
20070130441 | Wooten | Jun 2007 | A1 |
20070168636 | Hummel | Jul 2007 | A1 |
20100011147 | Hummel | Jan 2010 | A1 |
20100279653 | Poltorak | Nov 2010 | A1 |
20110197003 | Serebrin | Aug 2011 | A1 |
20110202724 | Kegel | Aug 2011 | A1 |
20120017063 | Hummel | Jan 2012 | A1 |
20120167082 | Kumar | Jun 2012 | A1 |
20120206466 | Sharp | Aug 2012 | A1 |
20120272037 | Bayer | Oct 2012 | A1 |
20130007379 | Kegel | Jan 2013 | A1 |
20130138840 | Kegel | May 2013 | A1 |
20140068137 | Kegel et al. | Mar 2014 | A1 |
20140173265 | Kegel | Jun 2014 | A1 |
20140181461 | Kegel | Jun 2014 | A1 |
20140258700 | England | Sep 2014 | A1 |
20150067296 | Basu | Mar 2015 | A1 |
20150089184 | Mukherjee | Mar 2015 | A1 |
20160259735 | Evans | Sep 2016 | A1 |
20170139796 | He | May 2017 | A1 |
20170171194 | Durham | Jun 2017 | A1 |
20170185766 | Narendra Trivedi | Jun 2017 | A1 |
20170199827 | Sankaran | Jul 2017 | A1 |
20180129616 | Liang | May 2018 | A1 |
20180253331 | Sato | Sep 2018 | A1 |
Entry |
---|
Amit et al., “vIOMMU: Efficient IOMMU Emulation,” publication date unknown, 14 pages. |
Malka, “Rethinking the I/O Memory Management Unit (IOMMU),” Research Thesis, submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, Submitted to the Senate of the Technion—Israel Institute of Technology, Adar 5775, Haifa, Mar. 2015, 121 pages. |
Willmann et al., “Protection Strategies for Direct Access to Virtualized I/O Devices,” Rice University, [Retrieved from the Internet <URL: https://www.usenix.org/legacy/event/usenix08/tech/full_papers/willmann_html/> on Feb. 4, 2016], 22 pages. |
Number | Date | Country | |
---|---|---|---|
20170249106 A1 | Aug 2017 | US |