Central processing units (CPUs), such as those found in network processors and other computing systems, often implement out-of-order execution to complete work. In out-of-order execution, a processor executes instructions in an order based on the availability of input data and execution units, rather than by their original order in a program. As a result, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next available instructions independently. The processor may also implement branch prediction, whereby the processor performs a speculative execution based on the data immediately available. If the speculation is validated, the results are immediately available, increasing the speed of the execution. Otherwise, incorrect results are discarded.
Typical out-of-order machines can be exploited by security vulnerabilities inherent in some misspeculation events. During a speculative execution, data may be loaded into a cache, where it can remain after the speculation is determined to be incorrect. An attacker can then implement additional code to access this data. Spectre and Meltdown are the names given to two security vulnerabilities that can be exploited in this manner.
Example embodiments include a method of managing an out-of-order machine to prevent leakage of information following a misspeculation event. Information regarding a first state of the out-of-order machine is stored to a reorder buffer. The information can indicate a state of one or more registers, location of data, and/or the state of scheduled, pending and/or completed instructions. When the machine progresses to a second state (e.g., during or after a speculation operation), information regarding the second state may be stored to the reorder buffer. This information indicates whether data is moved to a cache during transition from the first state to the second state. In response to detecting a misspeculation event of the second state, access is prevented to at least a portion of the cache storing the data.
In further embodiments, preventing access may include invalidating the at least a portion of the cache storing the data, and/or invalidating an entirety of the cache. The cache may include a d-cache, a branch target cache, a branch target buffer, a store-load dependence predictor, an instruction cache, a translation buffer, a second level cache, a last level cache, and a DRAM cache. In response to detecting the misspeculation event, a branch predictor, a branch target cache, a branch target buffer, a store-load dependence predictor, an instruction cache, a translation buffer, a second level cache, a last level cache, and/or a DRAM cache may be invalidated.
The information regarding the first and second states may include the locations of cache blocks storing the data, as well as an indication of cache blocks created during execution of one or more operations associated with the misspeculation event. Based on the information regarding the second state, cache blocks created during execution of operations associated with the misspeculation event may be identified. The misspeculation event may occur during a load/store operation executed by the machine. The first state may correspond to a state of the machine prior to execution of the load/store operation, and the second state may correspond to a state of the machine after the execution of the load/store operation.
Further embodiments may include an out-of-order machine comprising a cache, a processor configured to execute an instruction, a reorder buffer, and a controller. The controller may be configured to 1) store information regarding a first state of the out-of-order machine to the reorder buffer prior to execution of the instruction; 2) store information regarding a second state of the out-of-order machine to the reorder buffer following execution of the instruction, the information indicating whether data is moved to the cache between the first and second states; and 3) in response to detecting a misspeculation event of the second state, prevent access to at least a portion of the cache storing the data.
The foregoing will be apparent from the following more particular description of example embodiments, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments.
Example embodiments are described in detail below. Such embodiments may be implemented in any suitable computer processor, particularly out-of-order machines such as a network services processor or a modern central processing unit (CPU).
The machine 100 includes a processor 105, a register file 108, a cache 130, a controller 120, and a reorder buffer 150. The processor 105 may perform work in response to received instructions and, in doing so, manage the register file 108 as a temporary store of associated values. The processor 105 also accesses the cache 130 to load and store data associated with the work. The cache 130 may include one or more distinct caches, such as a d-cache, a branch target cache, a branch target buffer, a store-load dependence predictor, an instruction cache, a translation buffer, a second level cache, a last level cache, and a DRAM cache, and can include caches located on-chip and/or off-chip.
The controller 120 manages the reorder buffer 150 to track the status of instructions assigned to the processor 105. The reorder buffer 150 stores information about the instructions, as well as the order(s) in which the corresponding work product is to be reported. As a result, the processor 105 can execute instructions in an order that maximizes efficiency independent of the order in which the instructions were received, while the reorder buffer 150 enables the work product to be presented in a required order.
In order to improve the speed and efficiency of execution, the processor 105 may perform branch prediction. When the processor 105 does not have immediate access to all data needed to execute an instruction, it may perform a speculative execution based on the data immediately available to it. If the speculation is validated once the missing data is received, the results are immediately available. Otherwise, incorrect results, produced by a misspeculation, can be discarded.
Typical out-of-order machines can be exploited by security vulnerabilities, such as the vulnerabilities known as Spectre and Meltdown. Those vulnerabilities can occur as a result of a misspeculation. During a speculative execution, data may be loaded into a cache, where it can remain after the speculation is determined to be incorrect. An attacker can then implement additional code to access this data. For example, an incorrect speculation due to a branch prediction, jump prediction, ordering violation, or exception may occur as a result of instructions such as the following:
LD a, [ptr]
LD b, [a*k]
When executed by the processor 105, the first instruction is a load instruction that will access a piece of memory the attacker wants knowledge of. The second instruction is a load instruction that will use the result of the first to compute an address. The second load instruction will cause the memory system to move the memory contents pointed to by the load (e.g., in the memory 180) into the cache 130 (e.g., a d-cache). At some point afterwards, the processor 105 determines that the speculation event was incorrect. In response, the processor 105 will reference the record stored to the reorder buffer 150 to back the machine up, restoring its architectural state to its condition prior to the speculation. In typical machines, the fact that a cache block got loaded into the cache 130 will remain, and the attacker can employ certain calculations to determine which block was loaded. If the attacker knows which block was loaded, the attacker may be able to discern the content of the block. This attack takes advantage of the fact that, in typical out-of-order machines, the location of data in the memory hierarchy is not considered an architectural state.
Example embodiments can prevent access to information moved as a result of a misspeculation, thereby preventing vulnerability to attacks such as the attack described above. In one embodiment, the controller 120 may track and associate cache blocks that are moved into the cache 130 with the load/store that incorrectly executed due to misspeculation, storing information regarding those moves to the reorder buffer 150. When the machine 100 detects a misspeculation event, the machine may invalidate some or all cache blocks from the cache 130 that were created while executing down the incorrect path. Further embodiments may also manage a translation lookaside buffer (TLB), second level data cache, an instruction cache, or another data store in the same manner.
Client computer(s)/devices 50 and server computer(s) 60 provide processing, storage, and input/output devices executing application programs and the like. The client computer(s)/devices 50 can also be linked through communications network 70 to other computing devices, including other client computer(s)/devices 50 and server computer(s) 60. The communications network 70 can be part of a remote access network, a global network (e.g., the Internet), a worldwide collection of computers, local area or wide area networks, and gateways that currently use respective protocols (TCP/IP, Bluetooth®, etc.) to communicate with one another. Other electronic device/computer network architectures are suitable.
In one embodiment, the processor routines 92 and data 94 may include a computer program product (generally referenced 92), including a non-transitory computer-readable medium (e.g., a removable storage medium) that provides at least a portion of the software instructions for the invention system. The computer program product 92 can be installed by any suitable software installation procedure, as is well known in the art. In another embodiment, at least a portion of the software instructions may also be downloaded over a cable communication and/or wireless connection.
While example embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the appended claims.