The present disclosure relates generally to processor architectures and, more specifically, to execution of privileged operations.
In modern processing architectures, a typical central processing unit (“CPU”) operates on a large number of threads of execution at a time, switching between threads dedicated to handling operating system tasks and threads for various applications execution on that operating system.
In order to provide some logical guarantees to the individual applications, as well as security, CPUs can restrict the set of operations that can be utilized by a typical application. In practice, only a small amount of trusted code at the heart of the operating system, termed the kernel, is allowed to operate without restriction at an elevated privilege (e.g., kernel mode, master mode, supervisor mode, etc.) and perform any operation requested of the CPU. Other applications, including other portions of the operating system, operate at lower security levels (e.g., user mode, or an intermediate mode).
On occasion, an application may need to utilize a restricted operation available only at an elevated privilege level. In order to do so, the application may perform a system call (or “syscall”) to the kernel, which instructs the kernel to perform a certain operation on the application's behalf using the kernel's elevated privileges. However, system calls suffer from performance issues, for which a typical solution is to allow certain software (e.g., device drivers) to execute with elevated privileges at all times rather than having to request privilege elevation.
Accordingly, what is desired is an efficient technique for elevating privileges for a set of instructions.
The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present disclosure and, together with the description, further serve to explain the principles of the disclosure and to enable a person skilled in the relevant art to make and use the disclosed embodiments.
The present disclosure will now be described with reference to the accompanying drawings. In the drawings, generally, like reference numbers indicate identical or functionally similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
The following detailed description refers to the accompanying drawings that illustrate exemplary embodiments. Other embodiments are possible, and modifications can be made to the embodiments within the spirit and scope of the disclosure. Therefore, the detailed description is not meant to limit the disclosure.
It would be apparent to one of skill in the art that the present disclosure, as described below, can be implemented in many different embodiments of software, hardware, firmware, and/or the entities illustrated in the figures. Any actual software code with the specialized control of hardware to implement the present disclosure is not limiting of the present disclosure. Thus, the operational behavior of the present disclosure will be described with the understanding that modifications and variations of the embodiments are possible, and within the scope and spirit of the present disclosure.
Reference to modules in this specification and the claims means any combination of hardware or software components for performing the indicated function. A module need not be a rigidly defined entity, such that several modules may overlap hardware and software components in functionality. For example, a software module may refer to a single line of code within a procedure, the procedure itself being a separate software module. One skilled in the relevant arts will understand that the functionality of modules may be defined in accordance with a number of stylistic or performance-optimizing techniques, for example.
In a traditional approach, a set of instructions may be fetched by fetch stage 102 in the order A, B, C, SYSCALL, D, E, F, by way of non-limiting example. When the SYSCALL instruction is fetched into pipeline 100 at fetch stage 102, it is not immediately recognized by the processor as a SYSCALL instruction. Only once the instruction is further down the pipeline (e.g., at decode stage 104, which may be several stages later in a longer pipeline) will it be recognized as such, and only at an even later stage (e.g., execution stage 110) would it be confirmed that the SYSCALL function is actually executed.
However, under the traditional approach, instructions D, E, and F following the SYSCALL instruction, as shown in user mode instruction stream 202, have been loaded into the pipeline as user mode privilege instructions. In order to process the kernel mode instruction stream 204, the pipeline will either need to be flushed upon execution of the SYSCALL instruction and loaded with instructions X, Y, Z, and ERET (which indicates the return of control to the user mode instruction stream) at an elevated privilege level, or the dispatch of instructions D, E, and F can be stalled.
Accordingly, the traditional operation of the SYSCALL instruction is to act as an exception. When the SYSCALL instruction is executed, the pipeline 100 is flushed in a typical example. In the foregoing example, subsequent instructions D, E, and F would be flushed from the pipeline, and any changes made by processing of those instructions through pipeline 100 would not be committed. Instructions X, Y, and Z, as well as the ERET instruction, would then be fetched by fetch stage 102 at an elevated privilege level. Once control is returned to the user mode instruction stream, instructions D, E, and F could then be re-fetched and executed.
The aforementioned approach suffers from the need to fully flush the pipeline after a SYSCALL instruction is executed, or to at least stall execution, causing several lost processing cycles. As with branch misprediction, the cost is equivalent to the number of stages from the fetch stage to the execution stage of pipeline 100. This cost, which would be incurred frequently by some types of software (e.g., device drivers), is generally considered unacceptable by software developers. This has led to the practice by some developers of simply providing elevated privileges to the entire piece of software (e.g., by running device drivers in kernel mode) and accepting the security risk associated with providing kernel mode privileges to that code (i.e., software executing in kernel mode should be trusted).
A proposed solution in accordance with an embodiment relies on a privilege elevation mechanism that operates in a manner similar to a branch predictor. Instead of having to execute the entire piece of software with elevated privileges (as is typically done with device drivers) in order to avoid the performance cost associated with pipeline flushes as described above, a solution in accordance with an embodiment provides for speculative privilege elevation. As shown in pipeline 100, exemplary fetch stage 102 includes a prediction module 114, in accordance with an embodiment. Prediction module 114 can be integral to, or separate from, prediction functionality for handling branch prediction and any other predictive operations in pipeline 100.
In accordance with an embodiment, prediction module 114 determines at an early stage of pipeline 100 (e.g., fetch stage 102) whether a current instruction in that stage is a valid SYSCALL instruction.
Speculative privilege elevation in this manner effectively promotes the task of executing the SYSCALL instruction to an early stage of pipeline 100, such as fetch stage 102. The responsible logic will, upon identifying the SYSCALL instruction, begin fetching from the exception handling code and elevate the privilege level accordingly. This avoids the need, as with the traditional approach, to fetch and begin processing the subsequent instructions in the user mode instruction stream (e.g., instructions D, E, and F), and can therefore avoid the need to flush those instructions.
The method then proceeds to step 306 where state data corresponding to the new privilege level is set, in accordance with an embodiment. This state data can be stored in a memory that can be rapidly accessed by fetch stage 102. At step 308, pipeline recovery information is stored in an embodiment in order to recover the state of the pipeline and the current instruction being processed (e.g., as indicated by a program counter) in the event of a privilege elevation misprediction, discussed in further detail below. At step 310, the state data is passed to the next pipeline stage (e.g., decode stage 104), indicating the elevated privilege level of the instruction, in accordance with an embodiment. The effect is such that while the state data indicating elevated privileges is present (e.g., a bit indicating kernel mode operation), any subsequent instructions will be passed to the next pipeline stage of pipeline 100 together with a corresponding privilege data. This data is passed together with the instruction (e.g., instructions X, Y, and Z in the above example) through as many stages as the notion of elevated privileges is relevant, which may be the entire pipeline or some portion thereof. The method then ends at step 312.
One exemplary implementation is to dispose a latch between fetch stage 102 and the subsequent stage (e.g., decode stage 104) to serve as the state data store, in accordance with an embodiment. The data latched by this latch is modified whenever the privilege state changes, and provides decode stage 104 with the corresponding privilege level for an instruction. Each subsequent stage may continue to pass this bit or bits of data to other subsequent stages as needed, and the architecture of pipeline 100 is developed, in accordance with an embodiment, to accommodate passage of this extra data among the pipeline stages.
One skilled in the relevant arts will further recognize that the techniques described herein may be applied to other types of instructions with similar characteristics to a SYCALL instruction. For example, other privilege elevation instructions may operate in a similar manner and are contemplated within the scope of this disclosure. Likewise, hypervisor calls in a virtualized environment (i.e., calls made by a virtualized environment to the hypervisor) can benefit from a similar approach. In newer CPU architectures, embedded virtualization support includes the addition of new privilege levels in which the hypervisor operates in, and calls can be made for transitioning to hypervisor mode (as with kernel mode in the examples provided herein). These approaches are also contemplated within the scope of this disclosure.
III. Privilege commitment and Resolution
With the privilege escalation instruction (e.g., SYSCALL) identified and the instructions following the SYSCALL being flagged for execution at an elevated privilege level, it nevertheless remains possible for the SYSCALL instruction itself to not execute. For example, the SYSCALL instruction may be part of a mispredicted instruction branch, or an exception or interrupt may be executed prior to execution of the SYSCALL instruction by the execution pipeline stage (e.g., stage 110).
In the case of a branch misprediction, one skilled in the relevant arts will appreciate that a number of existing techniques can be utilized in the form of branch misprediction correction logic to correct the processor's fetch logic and ensure that any instructions from the mispredicted path are flushed out and do not commit any state information. In addition, this exemplary solution including speculative privilege elevation provides for correcting the privilege level and broadcasting this corrected privilege level to the fetch stage 102. As a result, subsequent instructions fetched from the correct branch will be fetched with the correct privilege level.
Several techniques can be utilized for reducing the cost of such a pipeline flush, or for compromising on the cost in exchange for increased performance when such a pipeline flush is not needed.
At step 406, a determination is made as to whether privilege elevation by the SYSCALL was successfully made at the later pipeline stage (e.g., during execution of the SYSCALL instruction at execution stage 110). If privilege elevation was in fact successful, then the block is released at step 410 and the instructions continue processing at step 412, as noted above. However, if the SYSCALL instruction was not executed (due to, e.g., a branch misprediction or an exception or interrupt that prevents the SYSCALL instruction from executing), the result is a privilege elevation misprediction. At step 408, the mispredicted instructions having an elevated privilege level are flushed, and the pipeline 100 state is recovered to the point at which the misprediction occurred. In accordance with an embodiment, pipeline 100 state recovery relies on information stored at step 308 of
A non-limiting example where this approach would be utilized is in the case where a branch prediction causes pipeline 100 to process the instructions of instruction stream 200. The instructions may be processed in the order A, B, C, SYSCALL, then into kernel mode instructions 204 X, Y, and Z based on the speculative privilege elevation process described above. While speculatively executing instructions X, Y, and Z at an elevated privilege level, the pipeline may be notified that the entire branch (e.g., all of instruction stream 200) was mispredicted, and execution should be taking place on a different set of instructions. This could occur if, for example, instruction B is a branch or interrupt instruction that was initially mispredicted by pipeline 100. Using the aforementioned approach, the CPU fetch logic is redirected to the correct branch (e.g., instructions R, S, T, etc., corresponding to instructions of the correct instruction branch). In addition, since the privilege level at the time of the misprediction has been stored in an embodiment, the privilege level is restored to the state at the time when the branch instruction was fetched.
The ability to restore the privilege level to the prior state corresponding to the time of the mispredicted branch allows for effects of speculative privilege elevation (e.g., executing instructions in kernel mode) to be reverted in case the privilege elevation should never have occurred. In an embodiment, existing branch misprediction logic performs a number of tasks, such as reverting the program counter to an earlier state, in order to undo the effects of a branch misprediction. The disclosed approach further provides facilities for undoing the effects of privilege elevation in a similar manner, in the event that the privilege elevation instruction (e.g., SYSCALL) should never have executed.
However, if privilege elevation was mispredicted, the pipeline is flushed at step 508 and the first mispredicted instruction is fetched again at fetch stage 102, a process that is similar to that of flowchart 400 in
While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the relevant art(s) that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined in the appended claims. It should be understood that the disclosure is not limited to these examples. The disclosure is applicable to any elements operating as described herein. Accordingly, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.