1. Field of the Invention
This invention relates generally to security in computer processors and particularly to automated enforcement of security policies in such processors.
2. Description of the Related Art
The development effort required to build a system is directly proportional to the cost of its failure; hence critical systems used in space shuttles and banks undergo much more rigorous development cycles than systems for home users. Such high assurance, trustworthy systems require a tremendous investment of time, effort, and money by their small community of users, and, in comparison to commodity systems, lag far behind in performance and programmability. Unfortunately, for commodity processors, security threats are often not considered at the rapidly changing Instruction Set Architecture (ISA) or micro-architecture levels. Allowing commodity parts to be retrofitted with protection mechanisms without increasing the cost for ordinary users would offer a significant advantage for high assurance system development.
The economics of trustworthy system development has placed designers under constraints not faced by low assurance, commodity systems. For example, the expense of special purpose hardware can make it costlier to provide both high performance and strong security. Even when hardware vendors incorporate security enhancements, integrating these mechanisms into a complex system design may present many practical and theoretical problems, driving up the costs and driving out the release schedule. This is especially true at the highest Common Criteria Evaluation Assurance Levels (EALs). A 2006 GAO report analyzing the cost of Common Criteria evaluations (of the more common EAL2, EAL3, and EAL4 variety) found, not surprisingly, that higher assurance levels tend to be costlier and more time-consuming. In addition to the fact that such system development costs per unit are very high, users requiring such functionality make up a small portion of the market. Sophisticated security mechanisms at the hardware level are typically targeted at a relatively small market sector and add unacceptable costs to commodity products.
Due to the high non-recurring engineering (NRE) cost of manufacturing custom hardware and the small amortization base of low volume products, manufacturers are often forced to choose less costly alternatives, such as an older, cheaper process (e.g., 0.5 μm vs. 45 nm). For this reason, the gap in performance between low volume (e.g., military) and commercial systems grows every year, with commercial hardware performance dominating by a factor of one hundred—a gap that did not exist thirty years ago. For example, according to the Institute for Defense Analyses, The United States Department of Defense (DOD)], as a low-volume customer, has benefited from some of this explosion in the commercial integrated circuit market, but DOD has increasingly encountered challenges in getting appropriate and affordable access to technology and products.
As a result of these economic factors, designers of trustworthy systems requiring high performance need some way to incorporate commercial hardware components without compromising security. Modern integrated circuit devices for general purpose processing (“GP processors”) are complex and expensive. While highly refined, market economics demand that GP processors address the general case, in which it is not possible to include in the integrated circuit dedicated mechanisms to enforce security policies during processing. The general design paradigm is that the GP processor should include only those mechanisms and functions that cannot be implemented efficiently as a software program that comprises invocations of the mechanisms already provided by the GP processor.
An operating system (“OS”) is a software program that provides instructions directly to the GP processor. An OS is responsible for managing the physical resources of the computer (e.g., main memory, disk memory, and various I/O devices), via GP processor instructions, while providing an execution environment for applications to access (abstractions of) those resources in a “secure” way. The definition of “secure” varies from OS to OS, and a given configuration of hardware and software results in what is called the computer's “automated security policy.” Mechanisms to control the actions of active elements of a computer system are sometimes called reference monitors when they are non-bypassable, self-protecting, and minimized.
It is difficult to maintain the confidentiality and integrity of data that is processed by a GP processor. To do so with a high degree of assurance requires purpose-built “secure operating systems” that require precise validation of correctness, and are therefore expensive. Commercial operating systems cannot be depended upon to enforce many automated security policies, such as those required to protect highly valued information.
“Multi-die” technology provides a way to add circuitry to a GP processor, for passively observing its behavior, without requiring much change to the GP processor. Recent research in “multi-die” integrated circuit technology has provided a minimally invasive means to integrate monitoring circuits into the GP processor. In this approach, sockets, each of which can accept a communication post, are integrated into the design of the GP processor, such that it can be manufactured with or without an additional die used for security purposes. During the manufacture of the GP processor, if it is to be enhanced with this extra circuitry, another die (the “control plane”) is attached to the GP processor (viz., the “computation plane”), in such a way that signals can pass between the planes through specific “vias” or “posts” that connect to the sockets. This method was originally designed for passive monitoring, and to date all publications on this subject have been limited to passive monitoring.
Commercial operating systems do not adequately control the activities of applications that they host in a secure manner. Addition of software logic to these OSs is not a feasible solution, as the OS is too complex to be able to verity that the resulting enhanced OS would enforce the desired automated security policy. Secure operating systems incorporate the desired software logic for controlling applications in a secure and verifiable manner. However the development and verification processes required to provide a high assurance of the correctness of enforcement of the automated security policy result in a high cost.
Embodiments in accordance with the invention disentangle specialized security mechanisms from the commodity design and provide the addition of security functionality to a processor as a foundry-level configuration option. In accordance with one embodiment, a computing system includes: a computation plane that includes one or more dies arranged for performing computation, which, in certain instances, is required to be secure; a control plane that includes one or more dies performing operations necessary to ensure the security of the entire system; a plurality of direct electrical connections between the computation plan and control plane; and a plurality of electronic interfaces arranged to allow the control plane to activate and control portions or the whole of the computation plane for the purposes of increasing the security of its operation.
In accordance with another embodiment, a method for controlling access of a computer processor to a resource includes: (a) blocking uncontrolled access of the computer processor to the resource; (b) providing a control plane that includes data corresponding to a security policy; (c) providing a first signal post between the computer processor and the control plane to transfer signals from the computer processor to the control plane; (d) modifying signals from the computer processor so that the signals conform to the security policy; and (e) enabling the computer processor to have access through the control plane to transfer signals to the resource that conform to the security policy.
In accordance with a further embodiment, a security system for controlling access of a computer processor to a resource includes: a control plane that includes data corresponding to a security policy; a first signal post connected between the computer processor and the control plane to transfer signals from the computer processor to the control plane; a second signal post connected between the computer processor and the control plane to transfer signals that conform with the security policy from the computer processor to the resource; an apparatus in the control plane for modifying signals from the computer processor so that the signals conform to the security policy so that the computer processor is connected through the control plane to transfer signals to the resource that conform to the security policy; a cache eviction monitor located in the control plane for eliminating access-driven cache side channel attacks; memory elements connected to the computer processor for storing security bits that hold the permissions of a process to evict shared cache entries of other processes; and comparator circuitry arranged for comparing the security bits with instructions to load or store data to determine whether to allow a cache eviction.
Embodiments in accordance with the invention are best understood by reference to the following detailed description when read in conjunction with the accompanying drawings.
Embodiments in accordance with the invention provide a new and modular way to add security mechanisms to current and next generation processors through the use of 3-D interconnects. In one embodiment, these security mechanisms are implemented in a physical overlay including a separate plane of circuitry stacked on top of a commodity integrated circuit, e.g., chip. In various embodiments, the security mechanisms that reside in this overlay can be connected to the underlying chip with a variety of interconnect technologies, yet can be completely omitted without change to the commodity chip's function and without affecting its cost.
Embodiments in accordance with the invention provide means for integrating dedicated security-enforcement functions into the circuits of a GP processor while perturbing the GP processor to a small enough degree that the changes are acceptable to GP processor manufacturers. Accordingly embodiments in accordance with the invention provide an innovative application of multi-die technology for actively controlling the activities of the GP processor to enable more secure processing with commercial operating systems and to lower the cost of secure operating systems.
Embodiments in accordance with the invention utilize an active layer, herein called a 3-D control plane, which is specifically dedicated to security to implement a variety of security functions in a cost-effective and computationally efficient way. Specifically, embodiments in accordance with the invention provide a method for using 3-D integration for trustworthy system development, and combine an independently fabricated 3-D control plane containing arbitrary security functions, such as micro-architectural protection mechanisms, along with a commodity integrated circuit, referred to henceforth as the computation plane.
Security functions can be broadly classified as either active or passive monitors, depending upon whether the 3-D control plane modifies signals on the computation plane. Embodiments in accordance with the invention include precise circuit level primitives to build both active and passive monitors such that signals on the computation plane can be arbitrarily tapped, disabled, re-routed, or even overridden. Also disclosed herein is an exemplary overview of how the 3-D control plane can be integrated in a purely optional and minimally intrusive manner with very minor modification to the commodity computation plane.
In accordance with one embodiment, two pieces of silicon are fused together to form a single chip. The two active layers of the silicon, the commodity computation plane and 3-D control plane, are connected through inter-die vias, such as micron-width wires that are, e.g., chemically “drilled-and-filled” between the layers, that run vertically between the active layers. This ability to interconnect multiple active layers enables the optional addition of a plane to a processor specifically for security. This 3-D control plane has access to the security dependent signals of the system. A processor with this ability could be provided to customers requiring, for example, mechanisms to control information flow when security policies must be enforced or other security-specific support, whereas commodity systems simply might not include this extra, more costly, 3-D control plane.
For certain architectural arrangements of control features and computation cores, embodiments in accordance with the invention allow the secure processing of information using commercial OSs. Such arrangements include but are not limited to the use of multiple-core GP processors (“chip multi-processors,” or CMP), where a distinct OS is dedicated to managing each core, and each core is dedicated to processing information of one of several mutually suspicious activities, and the control plane is configured to control the interactions between cores.
Referring now particularly to
In one embodiment, active element 22 is separated from resource 24 and any signals that would have transited between active element 22 and resource 24 are routed to a control plane (using the multi-die method), which modifies the signals so that their effects conform with an automated security policy before routing them back to a computation plane.
Separation of active element 22 from a given resource 24 can be achieved by various means during the processor's lifecycle (e.g., design, manufacture, installation, or initialization). For example, during processor design, separation can be provided by ensuring there are no physical or logical electrical connections. Separation can be achieved through configuration of resources during installation or initialization if that (configuration) is included as a native capability of GP processor 20; and separation can be provided after manufacture through physically altering the circuitry. In particular, in accordance with one embodiment an override post installed during manufacture of GP processor 20 is included that provides separation, which requires minimal changes to the native processor electronics.
Referring to
As shown in
As shown in
In various embodiments, control plane 46 may be implemented in various circuit technologies, including FPGA and ASIC. Further embodiments in accordance with the invention can be applied to a wide variety of computational circuits, including but not limited to General Purpose processors, FPGAs and ASICs. In various embodiments, the computation plane 45 can be a single core or CMP.
In one embodiment, control plane 46 includes a metal layer 50 formed on a silicon substrate 51. This arrangement can be utilized to disable a bus 52 on computation plane 45 to ensure resource isolation. In one embodiment, computation plane 45 and 3-D control plane 46 are connected with inter-die vias, or through-silicon vias (TSVs) 54-57, which serve as posts. Posts are required to tap the required signals necessary for the security logic. In one embodiment, sleep transistors connected to posts 55 and 56 are used to disable bus 52 on computation plane 45. In one embodiment, posts 54 and 57 carry the rerouted signal from computation plane 45 to control plane 46, where reference monitor logic 64 enforces a security policy on the rerouted bus traffic.
Referring to
In one embodiment, posts 54 and 57 are connected between active monitor logic 64 in control plane 46 to CMOS logic circuits 65 and 66 in the computation plane 45. In one embodiment, 3-D control plane 46 can include several security functions on one chip. These functions can be implemented as either passive or active monitors. Notably, embodiments in accordance with the invention provide the ability for active monitoring of computation plane 45 in 3-D control plane 46.
One use of 3-D control plane 46 is to act as a passive monitor, simply accessing and analyzing data from computation plane 45. For example, control plane 46 can monitor accesses to a particular region of memory or audit the use of a particular set of instructions. To monitor these events, it is necessary to know when such events are occurring, which necessitates tapping some of the wires from the processor. This requires adding posts and vias to the instruction register and memory wires to gain direct access to the currently executing instruction. Passive monitoring can be implemented in 3-D technology, utilizing a set of vias to the top of computation plane 45, and then post 57 from there to 3-D control plane 46.
Whereas passive monitoring allows for auditing, anomaly detection and the identification of suspicious activities, systems enforcing security policies often require strong guarantees about restrictions to overall system behavior. Embodiments in accordance with the invention allow the use of active monitors to control information flow between cores, the arbitration of communication, and the partitioning of resources.
The key ability needed to support such functionality is to reroute signals to control plane 46 and then override them with potentially modified signals. With this technology and minor modification of computation plane 45, all inter-core communication, memory accesses, and shared signals can be forced to travel to control plane 46 where they are subject to both examination and control. For example, active monitoring can ensure that confidential data being sent between two cores (which are traditionally forced to traverse a shared bus) is not leaked to an unintended third recipient with access to that bus.
In one embodiment, modifying signals on computation plane 46 is accomplished in two parts. The first part is to ensure that the monitor has unfettered access to all the signals (tapping), which is, in essence, the same as the passive monitoring scenario described above. The second part is to selectively disable those links, essentially milling off portions of the computation plane (e.g. a bus), or override them to inject different values. The difficulty is that a capability (the connection between two components) is removed only by adding control plane 46 (which cannot physically cut or impede that wire). Computation plane 45 must be fully functional without an attached 3-D control plane 46, yet it needs to be constructed so that by wiring in some extra circuitry the targeted capability can be completely disabled. To accomplish this, components in computation plane 45 must be modified to support the active monitoring.
In one embodiment, an alternative method for disabling links is to physically impede the connection itself. An existing circuit technique called power gating is used for this purpose. Support for power gating is added through the addition of sleep transistors placed between a circuit's logic and its power/ground connections. The sleep transistors act as switches that effectively remove the power supply from the circuit. The circuit is awake when the transistors are activated by a specific signal, which provides power to the circuit allowing it to function normally. Alternatively, the sleep transistors can be given the opposite input and turned off, thus disconnecting the power to the circuit, temporarily removing all functionality, and effectively putting the circuit to sleep.
Sleep transistors are traditionally used to temporarily disable unused portions of an integrated circuit, thereby saving power by preventing leakage current. However, their use is also beneficial for providing the isolation an active monitor requires. With only a small amount of added hardware (two transistors 70, 72 and two resistors 76, 84, shown in
In addition to selectively removing power from some components on-chip, sleep transistors may be used to perform several key functions on data and control lines required by active monitors. Sleep transistors can be placed on any link that may need to be disabled or controlled. 3-D control plane 46 can manage them by simply providing a post that connects to their gate input. The following functions all use only one or two transistors per line and present a new set of options for trustworthy system development.
Referring to
Tapping can be used to send the requested signals to 3-D control plane 46 without interrupting their original path. As shown in
Re-routing as shown in
Another use of re-routing is using a signal for a different purpose than was originally intended. Once on 3-D control plane 46, the signal can be analyzed and combined with other data from 3-D control 46 or computation plane 45, or simply stored for later use. This can then be coupled with overriding (
Overriding (
An address signal from CPU 112 and a lock bit 142 are input to cache eviction monitor 110. Cache eviction monitor 110 includes security bits 134 that provide a process ID (PID) signal that is input to a comparator 136 for comparison to a Process ID (PID) 144. A locked signal is also output from security bits 134. Comparator 136 output and the locked signal are input to an OR gate 140 that has an output connected to cache 118 to provide a write enable signal thereto.
The address of the corresponding load/store is tapped to be sent to 3-D control plane 46, and the cache write-enable signal is overridden in the case of a locked cache line eviction. Lock bit 142 and the Process ID (PID) 144 are also provided to 3-D control plane 46. Once cache monitor 110 receives the load/store address, lock bit 142, and PID, it can determine whether a cache eviction can be granted based on whether the cache line is locked or whether the PID matches, and issue the appropriate override signal on the cache write-enable signal.
In one embodiment, the custom architecture of
In one embodiment, a method to prevent these attacks uses 3-D control plane 46 to maintain a cache protection structure that indicates, for each cache line, whether it is protected, and if so, for which process. When a different process loads or stores data related to a protected cache line, no eviction will occur, and the data is not cached unless an alternate line is available in the cache protocol being used.
Delivery of the previously mentioned required information to 3-D control plane 46 is through the vertical posts. A general idea of the number of posts 3-D control plane 46 needs on a given system is the sum of the number of bits of: the address size, the process ID size, possibly one post for the secure register, and a grant bit post. This results in fewer than 100 vias, which equates to about the silicon space for 50 bits of memory, which is a small and reasonable number of vertical posts to implement a strong security measure.
This disclosure provides exemplary embodiments of the present invention. The scope of the present invention is not limited by these exemplary embodiments. Numerous variations, whether explicitly provided for by the specification or implied by the specification or not, may be implemented by one of skill in the art in view of this disclosure.
This application claims the benefit of U.S. Provisional Application No. 61/303,422, filed Feb. 11, 2010, which is hereby incorporated in its entirety by reference.
Number | Date | Country | |
---|---|---|---|
61303422 | Feb 2010 | US |