In a virtualized computing environment, the underlying computer hardware is isolated from the operating system and application software of one or more virtualized entities. The virtualized entities, referred to as virtual machines, can thereby share the hardware resources while appearing or interacting with users as individual computer systems. For example, a server can concurrently execute multiple virtual machines, whereby each of the multiple virtual machines behaves as an individual computer system but shares resources of the server with the other virtual machines.
In one of the common virtualized computing environments, the host machine is the actual physical machine, and the guest system is the virtual machine. The host system allocates a certain amount of its physical resources to each of the virtual machines so that each virtual machine can use the allocated resources to execute applications, including operating systems (referred to as “guest operating systems”). For example, the host system can include physical devices that are attached to the PCI Express Bus (such as a graphics card, a memory storage device, or a network interface device). When a PCI Express device is virtualized, it includes a “includes a corresponding virtual function for each virtual machine of at least a subset of the virtual machines executing on the device. As such, the virtual functions provide a conduit for sending and receiving data between the physical device and the virtual machines. To this end, virtualized computing environments support efficient use of computer resources, but also require careful management of those resources to ensure secure and proper operation of each of the virtual machines.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
Performance counters are used to provide information as to how an aspect of a processing system, such as an operating system or an application, service, or driver is performing. The performance counter data is employed to identify, and remedy specified processing issues, such as system bottlenecks. Applications executing on a processing unit such as a graphics processing unit (GPU) configure registers in the GPU as performance counters that are used to monitor events that occur in the processing unit. The performance counter is incremented in response to the corresponding event occurring. For example, a performance counter that is configured to monitor read operations to a memory is incremented in response to each read of a location in the memory. As another example, a performance counter configured to monitor write operations to the memory is incremented in response to each write to a location in the memory. In some cases, the values of the performance counters are streamed to a memory such as a DRAM that collects and stores state information for subsequent inspection by the software application. For example, values of the performance counters can be written to the memory once per second, ten times per second, at other time intervals, or in response to various events occurring in the processing unit.
However, current virtualization schemes do not provide a mechanism for sharing, allocating, or deallocating registers for use as performance counters for different virtual functions associate with virtual machines. Consequently, virtualization systems are unable to stream performance counter values to memory for subsequent inspection or other uses. The present disclosure discloses techniques for allocating performance counters to virtual machines in response to requests obtained from a virtual function associated with the virtual machine.
The graphics processing unit (GPU) 106 is employed by the computing device 103 to create images for output to a display (not shown) according to some embodiments. In some embodiments the GPU 106 is used to provide additional or alternate functionality such as compute functionality where highly parallel computations are performed. The GPU 106 includes an internal (or on-chip) memory that includes a frame buffer and a local data store (LDS) or global data store (GDS), as well as caches, registers, or other buffers utilized by the compute units or any fix function units of the GPU 106.
The computing device 103 supports virtualization that allows multiple virtual machines 112(a)-112(n) to execute at the device 103. Some virtual machines 112(a)-112(n) implement an operating system that allows the virtual machine 112(a)-112(n) to emulate a physical machine. Other virtual machines 112(a)-112(n) are designed to execute code in a platform-independent environment. A hypervisor (not shown) creates and runs the virtual machines 112(a)-112(n) on the computing device 103. The virtual environment implemented on the GPU 106 provides virtual functions 115(a)-115(n) to other virtual components implemented on a physical machine to use the hardware resources of the GPU 106. Each virtual machine 112(a)-112(n) executes as a separate process that uses the hardware resources of the GPU 106. In some embodiments, the GPU 106 associated with the computing device 103 is configured to execute a plurality of virtual machines 112(a)-112(n). In this exemplary embodiment, each of the plurality of virtual machines 112(a)-112(n) is associated with at least one virtual function 115(a)-115(n). In another embodiment, at least one of the virtual machines 112(a)-112(n) is not associated with a corresponding virtual function 115(a)-115(n). In yet another embodiment, at least one of the virtual machines 112(a)-112(n) is associated with multiple virtual functions 115(a)-115(n). In response to receiving a request from a respective one of the virtual functions 115(a)-115(n) for a performance counter 121(a)-121(n).
A single physical function implemented in the GPU 106 is used to support one or more virtual functions 115(a)-115(n). The hypervisor allocates the virtual functions 115(a)-115(n) to one or more virtual machines 112(a)-112(n) to run on the physical GPU on a time-sliced basis. In some embodiments, each of the virtual functions 115(a)-115(n) shares one or more physical resources of the computing device 103 with the physical function and other virtual functions 115(a)-115(n). Software resources associated for data transfer are directly available to a respective one of the virtual functions 115(a)-115(n) during a specific time slice and are isolated from use by the other virtual functions 115(a)-115(n).
The security processor 118 is configured to allocate, via a controller, a register associated with a processor to the virtual function 115(a)-115(n), such that the register is configured to implement the performance counter 121(a)-121(n). In some embodiments, the security processor 118 functions as a dedicated computer on a chip or a microcontroller integrated in the GPU 106 that is configured to carry out security operations. To this end, the security processor 118 is a mechanism to authenticate the platform and software to protect the integrity and privacy of applications during execution.
Performance counters 121(a)-121(n) are a set of special-purpose registers built into the GPU 106 to store the counts of activities or events within computer systems. In some embodiments, performance counters 121(a)-121(n) are used to monitor events that occur in the in the virtual functions 115(a)-115(n) associated with the virtual machines 112(a)-112(n) in the GPU 106.
The memory 125 stores data that is accessible to the computing device 103. The memory 125 may be representative of a plurality of memories 125 as can be appreciated. The memory 125 is configured to store program code, as well as state information associated with each of the virtual functions 115(a)-115(n), performance data associated with each of the performance counters 121(a)-121(n), and/or other data. The data stored in the memory 125, for example, is associated with the operation of various applications and/or functional entities as described below.
Various embodiments of the present disclosure facilitate techniques for allocating registers configured to be implemented as performance counters 121(a)-121(n) to virtual functions 115(a)-115(n) in response to a request from a respective one of the virtual functions 115(a)-115(n) associated with a virtual machine 112(a)-112(n) executing on the GPU 106. For example, in some embodiments, the virtual function 115(a)-115(n) sends a request to the security processor 118 to allocate at least one register configured to implemented as a performance counter 121(a)-121(n) to the requesting virtual function 115(a)-115(n). In response to the request, the security processor 118 determines whether the request obtained from the virtual function 115(a)-115(n) is authorized to access the register or set of register requested by the virtual function 115(a)-115(n). For example, in some embodiments, the security processor 118 determines whether the register or set of registers requested by the virtual function 115(a) is within a permitted set of registers. To this end, in some embodiments, the security processor 118 is configured implement a mask that identifies ranges of registers or individual registers that are available to be configured as performance counters 121(a)-121(n). The mask is applied to filter the requests from the virtual functions 115(a)-115(n) based on the register or registers indicated in the request.
Upon a determination that the request from the virtual function 115(a)-115(n) is unauthorized, the security processor 118 is configured to deny access to the register or set of registers requesting by the virtual function 115(a)-115(n). Alternatively, upon a determination that the request from the virtual function 115(a)-115(n) is authorized to the access the register or set of registers requesting by the virtual function 115(a)-115(n), the security processor 118 allocates the register or set of registers configured to be implemented as performance counters 121(a)-121(n) to the requesting virtual function 115(a)-115(n).
The firewall 206 is a security hardware mechanism component configured to securely filter communication between computing devices. To this end, the firewall 206 is configured to form a barrier between untrusted computing entities and trusted computing entities. The tap delays 229 and the SPM data 231 are trusted computing entities that are communicated to the RLC 203.
The stream performance monitoring (SPM) tool 209 is configured to utilize the driver 210 to identify the respective one of the virtual functions 115(a)-115(n) currently executing on a virtual machine 112(a)-112(n). The SPM tool 209 is also configured to maintain a list of registers that define memory addresses and other properties for the SPM in the user local frame buffer 216. The list of registers maintained in the user local frame buffer 216 by the SPM tool 209 includes information such as, for example, performance monitor register list (addr) 218 which corresponds to information identifying physical addresses of the registers that are allocated to a virtual function 115(a)-115(n) that is currently executing on a virtual machine 112(a)-112(n), virtual addresses of the registers, performance monitor register list (data) 221, and/or other information. The user local frame buffer 216 also includes information such as, for example, the muxsel 223. The muxsel 223 includes data indicating which performance counters 121(a)-121(n) are associated with the virtual functions 115(a)-115(n), data indicating which events are being monitored by the performance counters 121(a)-121(n), and/or other data. The user local frame buffer 216 also includes a SPM ring buffer 226 which is a data queue where the SPM tool 209 stores the information related to the registers configured to be implemented as performance counters 121(a)-121(n) by the requesting virtual function 115(a)-115(n).
In an exemplary embodiment, the RLC 203 allocates at least one register configured to be implemented as a performance counter 121(a)-121(n) to a virtual function 115(a)-115(n) in response to a request received from the virtual function 115(a)-115(n). The request can include a physical address or a virtual address of the register. For example, the RLC 203 is configured to grant access to any register requested by a virtual function because the RLC 203 is a trusted entity. However, the process of requesting a register to implement a performance counter 121(a)-121(n) for a virtual function 1f5(a)-115(n) and selecting the register at the RLC 203, e.g., by adding the register to the performance counter list maintained by the RLC 203, is a security risk. Therefore, in some embodiments of the present disclosure, the firewall 206 is implemented as a security hardware mechanism by the processing unit to receive the requests from the virtual functions 115(a)-115(n) and forward requests that are within a permitted set of registers to the RLC 203. For example, in one embodiment, the security hardware mechanism is configured to implement a mask that identifies a range of registers or individual registers that are available to be configured as performance counters 121(a)-121(n). The mask is applied to filter the requests from the virtual functions 115(a)-115(n) based on the register or registers indicated in the request.
The virtual functions 115(a)-115(n) are untrusted entities that use virtual addresses to identify the locations of the registers. In some embodiments, a page table that maps the virtual addresses to the physical addresses is populated after a restore operation is performed to restore the performance counter registers based on a stored image of the registers. Therefore, in some embodiments, the RLC 203 and a restored virtual function 115(a)-115(n) are the mapping of the physical addresses used by the RLC 203 to the virtual addresses used by the restored virtual function 115(a)-115(n) differ. In other embodiments, the RLC 203 and the restored virtual functions 115(a)-115(n) are coordinated to ensure that the virtual addresses used to identify the registers associated with the virtual functions 115(a)-115(n) are mapped to the physical addresses used to identify the registers to the RLC 203 prior to construction of the corresponding page table.
In another embodiment, the RLC 203 is also configured to allocate registers to a virtual function 115(a)-115(n) based on state information retrieved in response to the virtual function 115(a)-115(n) being restored to operation on the virtual machine 112(a)-112(n). For example, when a respective one of the virtual machines 112(a)-112(n) is restored on a computing device 103 (
Once the state information associated with the restored virtual function 115(a)-115(n) is retrieved, the RLC 203 is configured to allocated a set of registers associated with the performance counters 121(a)-121(n) to the restored virtual function 115(a)-115(n) based on the state information associated with the restored virtual function 115(a)-115(n). Instead, the registers associated with performance counters 121(a)-121(n) are restored to a default value (such as zero) in response to a restore operation. Once a virtual function 115(a)-115(n) is restored and is executing on the virtual machine 112(a)-112(n), values of the performance counters 121(a)-121(n) are streamed to memory 125 (
Referring next to
The flowchart of
Beginning with block 403, the performance counter allocation system 200 is invoked to perform an allocation of performance counters 121(a)-121(n) to a respective one of the virtual functions 115(a)-115(n) (
In another embodiment, the RLC 203 is also configured to allocate registers to a virtual function 115(a)-115(n) based on state information that is retrieved in response to the virtual function 115(a)-115(n) being restored to operation on the virtual machine 112(a)-112(n). For example, when a respective one of the virtual machines 112(a)-112(n) is restored on a computing device 103 (
In some embodiments, the apparatus and techniques described above are implemented in a system including one or more integrated circuit (IC) devices (also referred to as integrated circuit packages or microchips), such as the performance counter allocation system 200 described above with reference to
A computer readable storage medium may include any non-transitory storage medium, or combination of non-transitory storage media, accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc , magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media. The computer readable storage medium may be embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
In some embodiments, certain aspects of the techniques described above may implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.