1. Field of the Invention
The invention is related to computing systems and more particularly to performance counters of computing systems.
2. Description of the Related Art
In general, a computing system adjusts operational parameters (e.g., hardware and software parameters) to improve performance. However, policies and computations may be so complex that software, rather than hardware, is used to adjust those operational parameters. The hardware typically provides one or more performance counters to track the occurrence of corresponding events or indicators of hardware performance. User-level processes are typically blocked from accessing those performance counters based on a corresponding privilege level. In a typical non-virtualized system (
The requirement that only the most privileged software can access performance counters limits the performance information that is available to other, less-privileged software. The less-privileged software may receive performance information from the more privileged software layers using, e.g., software emulation, system calls, or hypercalls, which are all operations that have significant performance costs that result in poorly resolved or late information, causing improper application of policies and thus reduced system performance. Performance information may not be available at all to the less-privileged software, leaving the less-privileged software unable to modify its operational parameters in response to dynamic system changes, resulting in degraded performance.
In at least one embodiment of the invention, a method includes updating contents of a value storage element indicating a number of occurrences of an event. The updating is based on contents of a match storage element storing event qualification information. The method includes providing the contents of the value storage element to a first software module executing on at least one processor. The providing is based on contents of a protect storage element indicating access information. In at least one embodiment, the method includes executing a first software module on the at least one processor in a first mode of operation. In at least one embodiment, the method includes executing a second software module on the at least one processor in a second mode of operation. In at least one embodiment, the second mode is more privileged than the first mode.
In at least one embodiment of the invention, an apparatus includes a match storage element configured to store event qualification information. The apparatus includes a value storage element configured to accumulate occurrences of an event in response to an indicator indicating detection of the event based on contents of the match storage element. The apparatus includes a protect storage element configured to store information indicating access to the value storage element by a software module executing on at least one processor. The apparatus includes a control module configured to provide, to the software module, read access to contents of the value storage element based on contents of the protect storage element.
In at least one embodiment of the invention, a tangible computer-readable medium encodes a representation of an integrated circuit that comprises a match storage element configured to store event qualification information. The apparatus includes a value storage element configured to accumulate occurrences of an event in response to an indicator indicating detection of the event based on contents of the match storage element. The apparatus includes a protect storage element configured to store information indicating access to the value storage element by a software module executing on at least one processor. The apparatus includes a control module configured to provide, to the software module, read access to contents of the value storage element based on contents of the protect storage element.
In at least one embodiment of the invention, a computer program product encoded in one or more tangible machine-readable media includes a first sequence of instructions executable with a first privilege level to configure a match storage element to store event qualification information. The computer program product includes a second sequence of instructions executable with the first privilege level to update an operating parameter of a system executing the computer program product based on contents of a value storage element configured to accumulate occurrences of an event detected based on contents of the match storage element.
In at least one embodiment of the invention, a computer program product encoded in one or more tangible machine-readable media includes a first sequence of instructions executable with a privilege level higher than a user privilege level. The first sequence of instructions is executable to configure a protect storage element to store information indicating access to a value storage element. The first sequence of instructions is executable to configure a match storage element by a sequence of instructions executable with the user privilege level.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
The use of the same reference symbols in different drawings indicates similar or identical items.
A technique for providing secure and virtualizable performance counters facilitates access to the performance counters by software modules having different privilege levels. As referred to herein, a software module is a program, process, or procedure that includes a set of instructions for controlling one or more portions of a computing system. Event count information is stored and managed by a performance counter module that includes value registers (i.e., storage elements) and associated control registers. The technique provides user-level software modules, as well as higher-privileged software modules, efficient access to performance information, which allows those software modules to respond quickly and precisely to system operational changes.
Referring to
Referring to
As referred to herein, a “virtual machine monitor” (VMM, e.g., VMM 202) or “hypervisor” is software that provides the virtualization capability. The VMM provides an interface between the user or guest and the physical resources. Typically, the VMM provides each guest the appearance of full control over a complete computer system (i.e., memory, central processing unit (CPU) and all peripheral devices). A Type 1 (i.e., native) VMM is a standalone software program that executes on physical resources and provides the virtualization for one or more guests. A guest operating system executes on a level above the VMM. A Type 2 (i.e., hosted) VMM is integrated into or executes on an operating system, the operating system components execute directly on physical resources and are not virtualized by the VMM. The VMM is considered a distinct software layer and a guest operating system may execute on a third software level above the hardware. Techniques described herein may be implemented using a Type 1 VMM, a Type 2 VMM or other suitable VMM.
Still referring to
In at least one embodiment of processing system 200, VMM 202 is executed by some or all processor cores in the physical resources of processing system 200. An individual guest 206 is executed by one or more of the processor cores included in the physical resources. The processors switch between execution of VMM 202 and execution of one or more guests 206. As referred to herein, a “world switch” is a switch between execution of a guest (i.e., a software module executing in a guest mode of processing system 200) and execution of a host (i.e., a software module executing in a privileged or host mode of processing system 200, e.g., executing VMM 202) or vice versa. In general, a world switch may be initiated by a VMRUN instruction of an AMD Secure Virtual Machine, a VMLAUNCH or VMRESUME virtual machine extension instruction of an Intel virtual machine, interrupt mechanisms, exception mechanisms, predetermined instructions defined by a control block (e.g., VMMCALL), or by other suitable technique. During a world switch, a current processor environment (e.g., processor core(s) executing guest 206 in guest mode or executing VMM 202 in host mode) saves its state information and restores state information for a target processor environment (e.g., processor core(s) executing VMM 202 in host mode or executing guest 206 in guest mode) to which the processor execution is switched. For example, VMM 202 initiates a world switch when VMM 202 executes a guest 206 that was scheduled for execution. Similarly, a world switch from executing guest 206 to executing VMM 202 is made when VMM 202 exercises control over physical resources, e.g., when guest 206 attempts to access a peripheral device, when guest 206 attempts to access a performance counter, when a new page of memory is to be allocated to guest 206, or when it is time for VMM 202 to schedule another guest 206, etc. A typical world switch can take thousands of cycles.
Virtualization techniques may be implemented using only software (which includes firmware) or by a combination of software and hardware (which includes microcode). For example, some processors include virtualization hardware, which allows simplification of VMM code and improves system performance for full virtualization (e.g., hardware extensions for virtualization provided by AMD-V and Intel VT-x). For example, AMD-V is an AMD64 extension that effectively provides a super-privileged operating mode in which a VMM can control a guest operating system.
In at least one embodiment of system 100, rather than requiring the VMM to emulate devices to route I/O requests from guest operating system drivers to manage access to common memory space and to restrict real device access to kernel mode drivers, virtualization techniques are further supported by IOMMU 105. IOMMU 105 is an MMU that couples a Direct Memory Access (DMA) capable input/output (I/O) bus to memory 106. As described above, MMU 107 translates processor-visible virtual addresses to physical addresses. Similarly, IOMMU 105 translates device-visible virtual addresses (i.e., device addresses or I/O addresses) to physical addresses. In at least one embodiment, IOMMU 105 provides DMA address translation and permission checking for device reads and writes. IOMMU 105 allows an unmodified driver in a guest OS to directly access a target device, without the overhead of running through a VMM (i.e., without a world switch) and without device emulation.
In at least one embodiment, IOMMU 105 translates addresses from device requests in system memory addresses and checks appropriate permissions on each access to provide memory protection from misbehaving devices. In at least one embodiment, IOMMU 105 is included as part of a HyperTransport™ or PCI bridge device. Embodiments of system 100 that include multiple HyperTransport™ links between processors and I/O hubs also include multiple IOMMUs. In at least one embodiment, IOMMU 105 assigns each of device(s) 108 a protection domain that defines I/O page translations used for each device in the domain. The protection domain specifies access permissions for each I/O page. In at least one embodiment, VMM 202 assigns all devices assigned to a particular guest operating system 208 the same protection domain, which creates a consistent set of address translations and access restrictions used by all devices running under control of the particular guest operating system 208. In at least one embodiment, VMM 202 configures I/O page tables to map system physical addresses to guest physical addresses, configures a protection domain for guest operating system 208, and then allows guest operating system 208 to execute. Drivers written for the real device execute as part of guest operating system 208 unmodified and unaware of underlying translations. Guest operating system transactions are isolated from those of other guests by I/O mapping provided by IOMMU 105.
In at least one embodiment, IOMMU 105 includes performance counter module 500, which facilitates secure and virtualizable performance counters. In at least one embodiment of system 200, performance counter module 500 is located in a separate module coupled between IOMMU and memory 106. Referring to
Referring to
In at least one embodiment of performance counter module 500, match register 508 is configured to select a particular device for events that are counted based on a device identifier (DeviceID). In at least one embodiment of performance counter module 500, match register 510 is configured to select a Process Address Space Identifier (PASID) that is used to identify an application address space within an x86-canonical guest virtual machine. It is used on a peripheral to isolate concurrent contexts residing in a shared local memory. Together, the PASID and DeviceID information uniquely identify an application address space. Note that use of the PASID and DeviceID for specifying an event to be counted is exemplary only and the match register(s) may be configured for events qualified based on additional or other criterion. In at least one embodiment, the match registers include a field that can be used to cause the hardware to ignore actual comparison results and always indicate no match. That field may be used to disable counting of events (e.g., temporarily). In at least one embodiment, the match registers include a field that can be used to cause the hardware to ignore actual comparison results and always indicate a match. That field is useful to match on all PASID values or match on all Device ID values. In at least one embodiment, the match registers include a filter field that causes the hardware to ignore certain bits of a field in a comparison. That field is useful to count events for select groups of values (e.g., count events for all PASID values from 0 through 6, inclusively, or count events for all DeviceIDs from 0 to 127). In at least one embodiment, the match registers include a min and/or max field so that the comparison is for a range of values, as programmed by software. Note that combinations of multiple match registers may be configured in complex ways to at least partially determine an event to be counted.
In at least one embodiment of performance counter module 500, each performance counter is associated with one or more attribute registers (e.g., attribute register 514) that is configured to select the type of event to be counted for the device specified by one or more corresponding match registers (e.g., match register 508 and match register 510). For example, an attribute register may indicate that the event is a hit of a Translation Lookaside Buffer (TLB) of the IOMMU for a selected value of a DeviceID and a selected value of PASID. Other events that may be counted include a number of interrupts, a number of page faults, a number of instructions executed, a number of I/O operations processed, and a number of times an attempt to read memory is satisfied by a cache. For security purposes, one or more of those parameters (e.g., one or more of the contents of the match registers and attribute registers) are locked, e.g., a user-level process is not allowed to change the DeviceID although the user-level process may be allowed to change the event being counted (e.g., TLB hit) or the PASID being matched.
In at least one embodiment of performance counter module 500, protection above and beyond the protection provided by memory page access controls includes providing one or more protect registers (e.g., protect register 516 and protect register 518) that indicate to control module 502 whether or not a particular register of register set 530 can be changed or viewed by a particular software module. In at least one embodiment of performance counter module 500, at least one protect register is configured to determine whether or not value register 506, match register 508, match register 510, and/or attribute register 514 can be modified by a particular software module. In at least one embodiment, the protect registers allow more privileged software to decide which devices and registers may be viewed and/or changed by less privileged software modules. Access control can be provided on a register-by-register basis. In at least one embodiment of performance counter module 500, one or more protect registers controls whether or not the corresponding match register can be written by a particular software module. In other embodiments of performance counter module 500, a read of the match register by a particular software module may be obscured by control module 502 based on contents of the protect register(s).
In at least one embodiment of performance counter module 500, virtual machine monitor 202 retains control over the protect register(s) and determines whether or not a particular software module (e.g., a guest operating system or a user-level process) may view or change the corresponding match register(s) of the register set for the performance counter. In at least one embodiment of performance counter module 500, a protect register is configured to prevent a user-level process from directly changing either the PASID or DeviceID programmed into the match registers unless the process makes the change request via the associated operating system. In at least one embodiment of performance counter module 500, a protect register is configured to prevent a guest operating system from changing the DeviceID programmed into a match register unless it makes the change request via the virtual machine monitor, but is configured to allow change to the PASID programmed into a corresponding match register. In at least one embodiment of performance counter module 500, one or more protect registers are configured to allow virtual machine monitor 202 to change the contents of any register in register set 530 and can do so by retaining control of the associated protect register(s).
In at least one embodiment of performance counter module 500, detection module 504 compares the contents of an IOMMU instruction buffer to the contents of at least one control register (e.g., match register) in register set 503. If detection module 504 detects a match for those control registers associated with a particular value register for a current event consistent with any corresponding attribute register, then the event is detected and detection module 504 updates the value register accordingly (e.g., increments or decrements the value register according to the design of the value register). The contents of the value register are made accessible to one or more software modules by the IOMMU and/or based on any corresponding protect register of performance module 500. The software module may then use the information in the value register to update system parameters.
Thus, performance counter module 500 allows different software modules to receive performance information that is of interest to the particular software module and/or according to the privilege level of that software module. Those register sets 530 and associated modules (e.g., control module 502 and detection module 504) provide fast access to the most current information to software modules having different levels of privilege, while reducing or preventing opportunities for less privileged processes to perturb this information. The techniques described herein may be applied to additional levels of privilege and domains of isolation.
As described above, the techniques described herein allow less-privileged software to efficiently and quickly access IOMMU performance counter information with relatively low overhead (i.e., the cost of a hardware access rather than the cost of software-mediated access, which may require a world switch). Meanwhile, the virtual machine monitor and operating system retain control of changes to the information, thereby maintaining system isolation and protection properties. Instead of taking dozens, hundreds, or even thousands of instructions to read or change performance information, a read of performance information consistent with the techniques described herein only takes a few cycles. Accordingly, software can obtain accurate, current performance information at any rate that it determines is necessary without throttling access or sampling rates, and without deferring usage of the performance information. As a result, software can adapt quickly and efficiently to changes in system behavior. In embodiments of system 200 that allocate much control of the system to threads and user-level processes executing on the system, performance counter module 500 allows those threads and user-level processes to efficiently access performance information and respond quickly and precisely to system operational changes.
While circuits and physical structures have been generally presumed in describing embodiments of the invention, it is well recognized that in modern semiconductor design and fabrication, physical structures and circuits may be embodied in computer-readable descriptive form suitable for use in subsequent design, simulation, test or fabrication stages. Structures and functionality presented as discrete components in the exemplary configurations may be implemented as a combined structure or component. Various embodiments of the invention are contemplated to include circuits, systems of circuits, related methods, and tangible computer-readable medium having encodings thereon (e.g., VHSIC Hardware Description Language (VHDL), Verilog, GDSII data, Electronic Design Interchange Format (EDIF), and/or Gerber file) of such circuits, systems, and methods, all as described herein, and as defined in the appended claims. In addition, the computer-readable media may store instructions as well as data that can be used to implement the invention. The instructions/data may be related to hardware, software, firmware or combinations thereof.
Structures described herein may be implemented using software executing on a processor (which includes firmware) or by a combination of software and hardware. Software, as described herein, may be encoded in at least one tangible computer readable medium. As referred to herein, a tangible computer-readable medium includes at least a disk, tape, or other magnetic, optical, or electronic storage medium.
The description of certain embodiments of the invention set forth herein is illustrative, and is not intended to limit the scope of the invention as set forth in the following claims. For example, while the invention has been described in an embodiment in which performance counters for events of an IOMMU are managed, one of skill in the art will appreciate that the teachings herein can be utilized with performance counters associated with other control modules of a computing system, and performance counter module 500 may be located and configured accordingly. For example, techniques described herein may be applied to MMU 107, performance counters for processors 102, devices 108, memory 106/110 and/or other system modules having events to be counted. Variations and modifications of the embodiments disclosed herein, may be made based on the description set forth herein, without departing from the scope and spirit of the invention as set forth in the following claims.