This application claims the benefit of and priority to Great Britain Application Number 2402746.8, which was filed on Feb. 27, 2024, the content of which are hereby incorporated by reference in their entirety.
The present technique relates to the field of snoop filtering.
A data processing system may comprise caching agents capable of caching data from shared memory. A coherency protocol may be used to maintain coherency between data cached at the respective caching agents. When one caching agent requests read/write access to data for a given address, this may trigger snoop requests to be sent to one or more other caching agents to check whether data is held for that address at the other caching agents and/or prompt changes in coherency state of the data held at other caching agents (e.g. triggering invalidation of cached data or return of a dirty data value held by another caching agent). As the number of caching agents increases, broadcasting snoop requests to each other caching agent can be extremely expensive in terms of bandwidth and may slow down performance by blocking processing of other requests, and so snoop filtering circuitry may be provided to at least partially track information about the addresses for which data is cached at a particular caching agent, so that some snoop requests can be suppressed if it is known that certain caching agents do not hold data for the relevant address.
At least some examples of the present technique provide an apparatus comprising: request receiving circuitry to receive a given memory system request specifying a target address in a given physical address space and a target memory encryption context identifier indicative of a selected memory encryption context associated with the memory system request, the selected memory encryption context comprising one of a plurality of memory encryption contexts associated with the given physical address space; and snoop filtering circuitry to determine whether a snoop request is to be transmitted to a given caching agent in response to the given memory system request; in which: the snoop filtering circuitry is configured to: determine, based on the target memory encryption context identifier of the given memory system request and on snoop filtering information associated with the given caching agent, whether the target memory encryption context identifier is a snoop-not-required memory encryption context identifier for the given caching agent; and in response to determining that the target memory encryption context identifier is a snoop-not-required memory encryption context identifier for the given caching agent, suppressing transmission of a snoop request to the given caching agent in response to the given memory system request.
At least some examples of the present technique provide a system comprising: the apparatus described above, implemented in at least one packaged chip; at least one system component; and a board, wherein the at least one packaged chip and the at least one system component are assembled on the board.
At least some examples of the present technique provide a chip-containing product comprising the system described above assembled on a further board with at least one other product component.
At least some examples of the present technique provide computer-readable code for fabrication of an apparatus comprising: request receiving circuitry to receive a given memory system request specifying a target address in a given physical address space and a target memory encryption context identifier indicative of a selected memory encryption context associated with the memory system request, the selected memory encryption context comprising one of a plurality of memory encryption contexts associated with the given physical address space; and snoop filtering circuitry to determine whether a snoop request is to be transmitted to a given caching agent in response to the given memory system request; in which: the snoop filtering circuitry is configured to: determine, based on the target memory encryption context identifier of the given memory system request and on snoop filtering information associated with the given caching agent, whether the target memory encryption context identifier is a snoop-not-required memory encryption context identifier for the given caching agent; and in response to determining that the target memory encryption context identifier is a snoop-not-required memory encryption context identifier for the given caching agent, suppressing transmission of a snoop request to the given caching agent in response to the given memory system request.
At least some examples provide a storage medium storing the computer-readable code. The storage medium may be a non-transitory storage medium.
At least some examples provide a method comprising: receiving a given memory system request specifying a target address in a given physical address space and a target memory encryption context identifier indicative of a selected memory encryption context associated with the memory system request, the selected memory encryption context comprising one of a plurality of memory encryption contexts associated with the given physical address space; determining, based on the target memory encryption context identifier of the given memory system request and on snoop filtering information associated with a given caching agent, whether the target memory encryption context identifier is a snoop-not-required memory encryption context identifier for the given caching agent; and in response to determining that the target memory encryption context identifier is a snoop-not-required memory encryption context identifier for the given caching agent, suppressing transmission of a snoop request to the given caching agent in response to the given memory system request.
Further aspects, features and advantages of the present technique will be apparent from the following description of examples, which is to be read in conjunction with the accompanying drawings, in which:
Some data processing systems may support the ability to assign, to a given memory system request requesting a memory access to a target address in a given physical address space, a target memory encryption context identifier (MECID) which indicates a selected memory encryption context associated with the memory system request. The MECID distinguishes the selected memory encryption context from other encryption contexts associated with the given physical address space. Use of MECIDs can be useful to enable different subsets of physical addresses within a given physical address space to be treated differently for the purpose of handling encryption/decryption of data stored in a memory system, to provide for confidentiality of data stored by different software processes coexisting in the same physical address space.
The inventors have recognised that MECIDs can be useful information for filtering snoop requests sent to at least one caching agent in a data processing system, since the MECID can be a proxy for identifying a given software workload. The MECID-based snoop filtering information can be used either instead of, or in addition to, use of address-based snoop filtering information looked up based on the physical address of the memory access. Use of MECID-based snoop filtering allows, for a given level of performance in reduction of unnecessary transmission of snoop requests, a reduction in the amount of snoop filter state used for coherency tracking at the snoop filtering circuitry compared to a purely address-based snoop filtering scheme, since a single entry associated with a given MECID can be used to filter out snoop requests for many physical addresses associated with that MECID, rather than needing individual snoop filter entries per physical address. Hence, using MECIDs for snoop filtering can enable a more circuit-area-efficient snoop filter design than purely address-based snoop filtering approaches.
Hence, an apparatus comprises request receiving circuitry to receive a given memory system request specifying a target address in a given physical address space and a target memory encryption context identifier (MECID) indicative of a selected memory encryption context associated with the memory system request, the selected memory encryption context comprising one of a plurality of memory encryption contexts associated with the given physical address space; and snoop filtering circuitry to determine whether a snoop request is to be transmitted to a given caching agent in response to the given memory system request. The snoop filtering circuitry is configured to: determine, based on the target MECID of the given memory system request and on snoop filtering information associated with the given caching agent, whether the target MECID is a snoop-not-required MECID for the given caching agent; and in response to determining that the target MECID is a snoop-not-required MECID for the given caching agent, suppressing transmission of a snoop request to the given caching agent in response to the given memory system request. By suppressing transmission of a snoop request when the MECID is determined to be a snoop-not-required MECID, this can improve performance, not only because the bandwidth that would otherwise be consumed by the suppressed snoop request can be used for other requests, but also because a memory system request that would depend on a response to the suppressed snoop request can be serviced earlier if the latency of the given caching agent responding to the suppressed snoop request can be eliminated. By using MECIDs to handle snoop filtering, a snoop filter with a given amount of circuit area budget for storing and maintaining snoop filter state information can achieve a greater amount of suppression of unnecessary snoop requests, to provide improved performance in comparison to a purely physical address-based snoop filter with an equivalent circuit area budget.
The MECID-based snoop filtering approach can be applied in any system where memory system requests can be tagged with MECIDs and coherency is to be maintained between data cached at respective caching agents.
However, the use of MECIDs for snoop filtering may be particularly useful where the given caching agent comprises a coherent device. In the field of memory systems, the term “device” may refer to non-CPU circuitry capable of issuing memory access requests to the memory system of a host system comprising at least one processor capable of instruction execution (e.g. a central processing unit (CPU) or graphics processing unit (GPU)). For example, the device could be an I/O (input/output) device or hardware accelerator. The host system may not be able to trust that the device design has been validated by the same provider as the provider of the host system, as the same host system design could be used in combination with a diverse range of devices. In some cases, devices may be external to the chip(s) implementing the host system and may be coupled to the host via an interface such as a PCIe® (Peripheral Component Interconnect Express) interface. While some devices may not be capable of acting as caching agents (so need to be coherent with the host system only in the sense that the memory access requests issued by the device cause the latest version of data held at other caching agents to be used for servicing those requests), there is increasing use of fully coherent devices having their own private cache which is to be kept coherent with data within the host system.
Hence, the coherent device may itself act as a caching agent that is part of the coherency protocol, which may potentially need to be snooped to identify whether it holds data for a target address subject to a read/write request from another source of memory access requests. However, the inventors recognised that snoop traffic to a coherent device may cause a security risk as the snoop requests may leak information on address access patterns made by other requesters, which could be exploited by attackers aiming to compromise sensitive software running on the host system. Also, as the validation of device designs may be outside the control of the provider of the host system, an attacker could implement a rogue device which issues fake responses to snoop requests (e.g. returning bogus data purporting to be the latest dirty data value for a given address), which could potentially compromise data stored in shared memory and cause incorrect functioning or other security violations for sensitive software executing on the host system.
Therefore, to improve security, it may be desirable to be able to prevent snoop requests being sent to a given coherent device when that coherent device does not hold any data for the target address of the snoop request. Maintaining a precise record of which physical addresses are cached at the given coherent device may be unacceptably expensive in terms of circuit area and power overhead. However, by using MECIDs to track what information is cached at the coherent device, the overhead can be greatly reduced, as a single MECID-based tracking entry may serve to track a larger set of physical addresses cached at the device. Hence, when MECID-based snoop filtering is used in conjunction with a coherent device, this can provide a security benefit and greatly reduce the circuit area cost of the snoop filter compared to purely physical address based snoop filtering.
In some examples, the coherent device may be a device compatible with the CXL® (Compute Express Link®) protocol, which is a cache-coherent interconnect protocol that supports use of fully coherent devices. It will be appreciated that other coherency protocols could also be used.
The request receiving circuitry and snoop filtering circuitry may be implemented at different locations within a processing system, depending on implementation choice. The component which comprises the request receiving circuitry and snoop filtering circuitry may be a stand-alone sub-system within the overall data processing system. The sub-system comprising the request receiving circuitry and snoop filtering circuitry could be licensed for manufacture independently from other parts of the data processing system (regardless of whether the sub-system is on a different chip to other components of the data processing system or on the same integrated circuit as other components of the data processing system, the sub-systems may nevertheless be licensed independently). Therefore, the caching agents themselves and other system elements (such as processing circuitry (e.g. a CPU) for assigning MECIDs to be used for certain memory access requests or home node circuitry for generating snoop requests) do not always need to be present in the same component as the request receiving circuitry and snoop filtering circuitry.
In some examples, the request receiving circuitry and snoop filtering circuitry may be implemented on a path via which snoop traffic is routed from home node circuitry to the given caching agent. Hence, the given memory system request received by the request receiving circuitry may comprise a snoop request targeting the given caching agent.
In this case, the home node circuitry which issued the snoop request may be expecting a response to the snoop request, even if the snoop filtering circuitry determines that transmission of the snoop request to the given caching agent is to be suppressed. Hence, in response to determining that the target MECID is a snoop-not-required MECID for the given caching agent, the snoop filtering circuitry may return a no-data response in response to the snoop request, the no-data response indicating that a private cache of the given caching agent does not hold valid data for the target address. By synthesizing a no-data response, without actually forwarding the snoop request to the given caching agent to query the coherency state of data for the target address at the given caching agent, system performance can be improved as the latency of the given caching agent responding to the snoop request can be eliminated from the time taken for the home node circuitry to receive the required snoop response, enabling memory access requests which depend on snoop resolution on average to be processed earlier (the actual latency of the memory access request may also depend on outcomes of other snoops on which the memory access requests depend, but on average the latency can be reduced by the MECID-based snoop filtering because the number of caching agents for which snoop responses are awaited can be reduced).
Although such MECID-based snoop filtering could be implemented that any point on the path taken by snoop requests to a coherent device, it can be useful to implement the snoop filtering circuitry at a port for connecting a host system to one or more devices. The port acts as a gateway for memory access requests/responses entering the host system from the associated devices and requests/responses transmitted from the host system to the associated device(s), and so can be a convenient point at which request traffic for a set of one or more devices can be monitored for maintaining the snoop filtering information and at which snoop requests for the associated set of devices can be intercepted and suppressed.
Some examples may (in addition to snoop filtering at ports for devices, or instead of snoop filtering device at ports for devices) provide the snoop filtering circuitry at home node circuitry which manages coherency for a plurality of caching agents. The home node circuitry may be the source of the snoop requests sent to each of caching agent subject to a coherency scheme. For examples where the request receiving circuitry and snoop filtering circuitry are located at the home node circuitry, the given memory system request may comprise a read/write request received from a requesting caching agent. Hence, for implementations where the MECID-based snoop filtering is implemented at the home node circuitry, in response to the given memory system request (read/write request), the snoop filtering circuitry may determine based on the snoop filtering information a snoop-not-required subset of the caching agents for which the target MECID is a snoop-not-required MECID, and suppress transmission of snoop requests to the snoop-not-required subset of caching agents in response to the given memory system request.
For home node based snoop filters using MECID-based snoop filtering information, the MECID-based snoop filtering information may not be the only type of snoop filtering information provided. The snoop filtering circuitry may also look up address-based snoop filtering information based on the target address of the given memory system request, to identify whether the target address is a snoop-required target address for the given caching agent. Sometimes, due to lack of information available to the home node circuitry or imprecision in the tracking provided by the snoop filtering information, it may be the case that the MECID-based snoop filtering information may not indicate whether the target MECID has data held at a particular caching agent, but the address-based snoop filtering information does indicate that the target address has information cached at that caching agent. In this case, the address-based snoop filtering circuitry may take priority, to ensure that coherency of any data cached at the caching agent for the target address can be maintained. Therefore, in response to determining that the target address is a snoop-required target address for the given caching agent, the snoop filtering circuitry may determine that a snoop request should be transmitted to the given caching agent in response to the given memory system request, even when the snoop filtering information is indicative of the target MECID is a snoop-not-required MECID for the given caching agent.
A wide variety of approaches are possible for maintaining the MECID-based snoop filtering information, which can vary in the precision with which the snoop filtering information tracks which MECIDs are cached at a given caching agent.
Some approaches may aim to precisely track each MECID which has data cached in a given coherency state (e.g. unique coherency state) at a given caching agent. For example, if the aim is to improve security by reducing likelihood of snoop requests being sent to a coherent device relating to a software context which does not require use of that device, it may be desirable to precisely track all the MECIDs which have data held in a particular coherency state (e.g. unique coherency state) at the coherent device.
In some examples, the snoop filtering information may specify one or more MECIDs which cannot be regarded as a snoop-not-required MECID for the given caching agent (for example, because those MECIDs have been detected as having been specified for requests to allocate data to the given caching agent's private cache). In this case, a MECID other than those MECIDs indicated in the snoop filtering information could be regarded as a snoop-not-required MECID. This approach can be useful for improving security by reducing unnecessary snoop traffic to a coherent device, as in practice the number of MECIDs for which data is cached at that device may be relatively limited and so it can be practical to maintain a precise record of each MECID with data held at the coherent device.
With the approach where the snoop filtering information contains a record of those MECIDs which are not snoop-not-required MECIDs, any other MECID may be considered a snoop-not-required MECID. Hence, if a given MECID is looked up against the snoop filtering information and a miss is detected, the given MECID may be assumed to be a snoop-not-required MECID, and so transmission of a snoop request specifying the given MECID to the given caching agent may be suppressed.
On the other hand, other examples may use a different approach, where the MECID-based snoop filtering information could for example track a set of MECIDs known to be snoop-not-required MECIDs for a given caching agent, or provide tracking information for a set of MECIDs, the tracking information for each MECID indicating whether or not that MECID is a snoop-not-required MECID for the given caching agent.
For example, this could be useful for the home node based snoop filtering approach, where once a given MECID is detected as being in use for one caching agent, the snoop filtering circuitry can track whether that MECID is seen for any requests allocating data to caches of other caching agents, and so for a given one of those other caching agents not having seen any requests tagged with the given MECID, the given MECID can be identified as a snoop-not-required MECID. In practice, it may not be practical for the snoop filtering circuitry to track such information for every possible value of the MECID, and so on some occasions a lookup of the MECID-based snoop filtering information for a target MECID may detect a miss when there is no valid information specifying whether the target MECID is a snoop-not-required MECID for a given caching agent. With this approach, on a miss in the MECID-based snoop information it may be assumed that it is unknown whether data for the target address and the target MECID is cached at the given caching agent, and so in the absence of any specific indication that the required data is not cached at the given caching agent, the snoop request may still be transmitted to the given caching agent. Nevertheless, the MECID-based snoop filtering allows some instances of snoop requests to be filtered out when it is detected that a given MECID definitely does not have any data cached at a given caching agent.
Hence, in scenarios when the target MECID is looked up in the snoop filtering information but no stored information is found for that target MECID, the response taken can vary depending on the implementation. In use cases (e.g. with snoop filtering applied for security reasons at a port associated with a coherent device) where the snoop filtering information tracks MECIDs for which snoops are required for the given caching agent, the miss in the snoop filtering information for a target MECID may cause the snoop request to be suppressed from being sent to the given caching agent. In use cases (e.g. with snoop filtering applied at the home node) where the snoop filtering information may identify, for a subset of MECIDs, which caching agents hold data for those MECIDs, then if there is a miss in the snoop filtering information for a target MECID, unless address-based snoop filtering can be used to filter out the snoop request, the snoop request may be broadcast to all caching agents (although it may then be subject to further snoop filtering at a port to a coherent device if the port-based snoop filtering is also implemented).
The snoop filtering information can be maintained in different ways. In some examples, the snoop filtering information comprises software-managed snoop filtering information. Particularly in examples where the MECID-based snoop filtering is implemented at a port for a device, hardware circuitry for maintaining the snoop filtering information indicating which MECIDs have data cached at a given device may not be considered justified, as typically the number of MECIDs in use for a given device may be small and may be entirely known to software running on the data processing system (e.g. privileged software which may have configured the device to give it access to the portions of address space associated with use of a particular MECID). Hence, by providing a software programming interface by which software is able to configure the snoop filtering information to define which MECIDs are snoop-not-required MECIDs for a given caching agent, the complexity of the snoop filtering circuitry can be reduced as there is no need to provide hardware monitoring circuitry to monitor requests to identify MECIDs in use at a given caching agent.
Hence, the snoop filtering circuitry may comprise programming interface circuitry to set the software-managed snoop filtering information in response to a programming request triggered by privileged software. For example, the programming interface may comprise a set of memory-mapped registers storing the snoop filtering information, where those memory-mapped registers can be exposed to software at certain addresses within a given memory address space. Alternatively, the programming interface may not expose the snoop filtering information to software directly, but can expose a request buffer or other programming interface data structure in the address space accessible to software. When software writes a programming request to the request buffer or other programming data structure, some hardware circuitry could respond to the programming request to update the snoop filtering information according to the information specified in the programming request. For example, the programming request may identify one or more MECIDs to be specified as being authorized for use with the given caching agent, so that those MECIDs would not be regarded as snoop-not-required MECIDs, or may identify one or more MECIDs previously specified as not being snoop-not-required MECIDs which should be removed from the list of authorized (not snoop-not-required) MECIDs. Access to the programming interface may in some examples be restricted to privileged software (with unprivileged software unable to program the snoop filtering information). For example, this access control can be implemented based on page table structures which may control whether the addresses mapped to the programming interface are accessible to less privileged software.
In some examples, the snoop filtering information comprises hardware-managed snoop filtering information set by the snoop filtering circuitry based on monitoring of request traffic for the given caching agent. By providing hardware for monitoring request traffic and deducing which MECIDs are in use for a given caching agent, this reduces the responsibility of software to track this information and so can make software development simpler.
One way of implementing hardware-managed snoop filtering information can be to provide a counter for tracking, for at least a subset of MECIDs, the number of cache lines held at a given caching agent for those MECIDs. Request traffic to the given caching agent may be monitored to update the snoop filtering information in response to detection of allocation of new lines in the private cache of the given caching agent, or invalidations of previously allocated lines from the given caching agent's private cache. Hence, in some examples, the snoop filtering circuitry may determine that the target MECID is a snoop-not-required MECID for the given caching agent, in response to determining that a cache line counter associated with the target MECID and the given caching agent indicates that a private cache of the given caching agent holds a non-zero number of cache lines associated with the target MECID.
In some examples, the cache line counter may track valid cache lines for a given MECID that are held in the given caching agent's private cache.
Other examples may support tracking valid cache lines for a restricted subset of coherency states (not necessarily all valid cache lines, if there are some coherency states for which no snoop would be needed due to specific details of the implemented coherency protocol). For example, in some examples, the cache line counter may count the number of valid cache lines for a given MECID that are in a unique coherency state which indicates that no other caching agent holds valid data for the cache line in its private cache. In this case, the snoop filtering circuitry may maintain, for the given caching agent, a set of cache line counters each associated with a corresponding MECID. In response to detecting a request to allocate a cache line associated with a specified MECID to a private cache of the given caching agent in the unique coherency state, the snoop filtering circuitry is configured to adjust the cache line counter corresponding to the specified MECID in a first direction (e.g. the counter being incremented or decremented). In response to detecting an indication that a cache line associated with a specified MECID has transitioned from the unique coherency state to a non-unique (e.g. shared or invalid) coherency state in the private cache of the given caching agent, the snoop filtering circuitry may adjust the cache line counter corresponding to the specified MECID in a second direction (opposite to the direction in which the counter is adjusted in the first direction). This approach can be a relatively simple to implement scheme for enabling hardware to monitor request traffic to maintain MECID-based snoop filtering information, to reduce the responsibility of software to track which MECIDs are used by each device.
Such counter-based snoop filtering information may not always be precise. The snoop filtering circuitry may not always receive indications from the caching agent every time a transition of coherency state occurs for data associated with a MECID being tracked in the snoop filtering information. For example, sometimes a given caching agent may silently invalidate data at its cache without informing the snoop filtering circuitry. Hence, the snoop filtering information may sometimes be imprecise, but the imprecision may be limited so that false positive detection of presence of unique cache lines in the private cache of the given caching agent may be allowed, but it may not be possible to have a false negative detection of absence of unique cache lines in the private cache of the given caching agent when the cache line is actually held in the unique state at the private cache of the given caching agent. Meeting this level of precision can be achievable in practice because the transitions to a unique state may generally be based on explicit read/write requests from the given caching agent itself, which will require explicit signals to be sent to the home node and so are easily detectable, whereas the transitions of a cache line for a given address from the unique state to another coherency state (e.g. invalid) may sometimes be prompted internally at the given caching agent without any need for a signal to be issued to the home node, so may not always be communicated to the snoop filtering circuitry.
In some examples, the snoop filtering information for the given caching agent comprises a set of snoop filtering indicators each associated with a respective MECID and indicating whether that MECID is a snoop-not-required MECID for the given caching agent. For example, the snoop filtering indicators may comprise a bitmap where each bit indicates whether a corresponding MECID is a snoop-not-required MECID for a particular caching agent. This can be a relatively simple way of implementing an indication of each MECID which is in use or is not in use for a given caching agent.
In some examples, the snoop filtering information comprises a set of MECID tracking entries for the given caching agent. Each MECID tracking entry, when valid, may specify a corresponding MECID and information indicative of whether the corresponding MECID is a snoop-not-required MECID for the given caching agent. For example, the information indicating whether the corresponding MECID is a snoop-not-required MECID could be any of:
If the snoop filtering information does not have sufficient capacity to track snoop filtering information for every encodable MECID at a given time, then from time to time the snoop filtering circuitry may reallocate which MECIDs are allocated valid MECID tracking entries.
For example, in response to detecting a request capable of allocating data associated with a new MECID to a private cache of the given caching agent, when the new MECID is not already tracked using one of the set of memory encryption context tracking entries, the snoop filtering circuitry may allocate one of the MECID tracking entries for the new MECID.
If such new allocation requires eviction of a previously used MECID tracking entry, then this could be handled in different ways depending on the use case.
For examples such as the home node based MECID snoop filter, where a default is applied where on a miss in the MECID-based snoop filtering information (and in absence of any other reason to deduce that the snoop can be filtered out for a given caching agent) the snoop request should be sent to that given caching agent, then it may be acceptable simply to discard the information from a victim MECID tracking entry that is selected for replacement with information for a newly allocated MECID.
However, in examples where the MECID-based snoop filtering information is more precise and is used to track each MECID which does have data cached at the given caching agent (e.g. in the port-based snoop filter approach used to reduce snoop traffic to a coherent device for security reasons), then the default may be to suppress snoops if there is a miss in the lookup of the MECID-based snoop filtering information. In this case, simply discarding information from a victim MECID tracking entry may risk loss of coherency if data for an address that is cached at the given caching agent is not snooped when another caching agent requires access to the data. Hence, in some examples, in response to detecting that allocating one of the MECID tracking entries for the new MECID requires replacement of a MECID tracking entry previously allocated for a victim MECID, the snoop filtering circuitry may trigger invalidation of any cache lines associated with the victim MECID from a private cache of the given caching agent. For example, the snoop filtering circuitry may issue a single invalidation bus command to the given caching agent specifying a given MECID (corresponding to the victim MECID whose tracking entry was reallocated for the new MECID). The bus command (generated by hardware when the replacement policy determines that a valid MECID tracking entry should be evicted) requests that the given caching agent acts upon that request by invalidating any cache line associated with the given MECID (regardless of which physical address is associated with the invalidated cache line having that MECID).
The apparatus may comprise memory encryption/decryption circuitry responsive to a memory access request specifying a given address and a specified MECID, to perform encryption or decryption of data associated with the given address based on key information selected based on the specified MECID. For example, the key information could be an encryption/decryption key, or could be a key modifier or tweak which is used in an encryption/decryption algorithm to modify a separate encryption/decryption key (the modifier or tweak having fewer bits than the key itself). Hence, by selecting key information using the MECID, data associated with different MECIDs can be encrypted/decrypted differently so that the same data value may be represented differently in memory when written to memory by requests associated with different MECIDs.
Specific examples are set out below with reference to the drawings.
Hence, a number of processing elements 6 may each have access to shared memory 12 within the host system 4. However, in addition to the processing elements 6 themselves, another source of memory access requests to shared memory 12 can be from devices 20, 22 coupled to the host system via corresponding root ports 26. Each root port 26 acts as a gateway to the host system for a corresponding device or group of devices. Although
The interconnect 10 may be associated with home node circuitry 32 which is responsible for maintaining coherency between cached data held at private caches 8, 24 of a number of caching agents of the data processing system 2. The caching agents can include the processing elements 6 as well as any coherent devices 22 which have their own coherent private cache 24 (other devices 20 may be non-caching devices (or “I/O coherent” devices) which do not have a private cache that needs to remain coherent with the host device). For example, a coherent device 22 could be a device for which the interface between the device 22 and host system 4 is compatible with the CXL® (Compute Express Link®) standard.
The home node circuitry 32 implements a given coherency protocol, which defines a set of request types and response protocols associated with those request types. Each address may, with respect to a particular caching agent, be considered to be held in that caching agent's private cache in a particular coherency state. For example, the coherency state may specify, with respect to a given address and a given caching agent 6, 22, whether valid data for that address is held at the given caching agent's private cache 24, and if valid data is held, whether that data is clean or dirty, and/or is held in a unique or shared state (unique data being held exclusively in that caching agent's cache, and not in other caching agent's caches, and shared data being capable of also being held in other caching agent's caches). The coherency protocol may require that certain request types or responses to such requests may be associated with certain transitions of coherency state for cached items of data associated with the target address of the request. When a read/write request is received from one of the caching agents 6, 22 or an I/O coherent device 20 requesting a read/write operation to a given physical address, the home node circuitry 32 issues snoop requests to one or more other caching agents that could potentially hold valid cached data for that physical address. A snoop request may query the current coherency state of the cached data for a specified address at a corresponding caching agent, and/or trigger changes in coherency state at the caching agent (e.g. invalidating cached data if the requester of the original read/write request requires the data to be cached in the unique state in its cache, and/or causing return of dirty data held in a snooped caching agent's cache 8 so that the dirty data can be made accessible to the requester which sent the read/write request).
As shown in
In some examples the system cache 34 and snoop filter 36 may be combined, with a single structure looked up based on an address providing both cached data and snoop filter information associated with that address.
As mentioned further below, in some instances, further snoop filtering circuitry 40 can be provided at the root port 26 associated with at least one coherent device 24, to provide for further filtering of snoop requests targeting that device.
As shown in
As shown in
In some processing systems, all virtual addresses may be mapped by the address translation circuitry onto a single physical address space which is used by the memory system to identify locations in memory to be accessed. In such a system, control over whether a particular software process can access a particular address is provided solely based on the page table structures used to provide the virtual-to-physical address translation mappings. However, such page table structures may typically be defined by an operating system and/or a hypervisor. If the operating system or the hypervisor is compromised then this may cause a security leak where sensitive information may become accessible to an attacker.
Therefore, for some systems where there is a need for certain processes to execute securely in isolation from other processes, the system may support operation in a number of domains and a number of distinct architectural physical address spaces 84 may be supported, where for at least some components of the memory system, memory access requests whose virtual addresses are translated into physical addresses in different architectural physical address spaces 84 are treated as if they were accessing completely separate addresses in memory, even if the physical addresses in the respective physical address spaces actually correspond to the same location in memory. By isolating accesses from different domains of operation of the processing circuitry into respective distinct physical address spaces as viewed for some memory system components, this can provide a stronger security guarantee which does not rely on the page table permission information set by an operating system or hypervisor.
In this example, the processing circuitry 16 can execute instructions in one of four security states: a non-secure security state, a secure security state, a realm security state and a root security state. The current security state indication 64 in the control registers 16 designates which security state is currently being used. Each of the four security states is associated with a corresponding architectural physical address space (PAS) 84. Hence, there are four architectural PASs: a non-secure PAS, secure PAS, realm PAS and root PAS.
The root state is the most privileged state, and is used for executing software which controls transitions to/from the other security states. The root state is able to have its virtual addresses translated from the virtual address space 80 to any of the four architectural physical address spaces. Information (NSE, NS) specified in the page table structures 74 used to control the virtual address (VA) to physical address (PA) mapping is used to control which architectural PAS is selected for a given memory access request issued in the root security state.
The non-secure state is the least privileged security state, and its memory accesses are translated by default into physical addresses in the non-secure PAS (potentially via two stages of address translation from virtual address space 80 to intermediate address space 82 and from intermediate address space to the non-secure physical address space 84).
The secure state and realm state are orthogonal security states which are not able to access each other's physical address spaces or the root physical address space, but which are able to select whether they access their own respective PAS (realm PAS for the realm state and secure PAS for the secure state) or whether they should access the non-secure PAS. Hence, information (NS) specified in the page table structures 74 used to control address translation can be used to select whether a given memory access request has its address translated into the non-secure or secure PAS (when the request is issued from the secure security state) or into the non-secure or real PAS (when the request is issued from the realm security state).
As shown in
Pre-PoPA memory system components, such as caches 8, 34 in the example of
Regardless of the form of the pre-PoPA memory system component, it can be useful for such a PoPA memory system component to treat the aliasing physical addresses of the respective architectural address spaces 84 as if they correspond to different memory system resources, as this provides hardware-enforced isolation between the accesses issued to different physical address spaces so that information associated with one domain cannot be leaked to another domain by features such as cache timing side channels or side channels involving changes of coherency triggered by the coherency control circuitry.
In contrast, once requests pass beyond the PoPA 94, the aliasing addresses from the respective architectural PASs 84 are mapped to a single unique physical address in the hardware PAS 86. For example, if the aliasing physical addresses in the architectural PASs 84 are simply those having the same physical address value, the mapping to the hardware PAS 86 can be carried out simply by stopping using the PASID as additional address bits when looking up storage structures. As shown in
The hardware physical address space 86 may be partitioned to enable access to certain physical memory system locations only to certain architectural PASs. This could be based either on a static mapping (e.g. memory regions assigned to certain devices 20, 22 could be statically reserved only to be accessible to a certain architectural PAS) or based on a dynamic mapping defined in a control structure which can be looked up by the MMU 70 to determine whether, for a memory access request which has caused translation of the virtual address specified by the request into a particular target physical address in a target PAS 84, that target physical address is allowed to be accessed from within that target PAS 84.
The use of multiple architectural PASs 86 for addressing some pre-PoPA memory system components such as caches 8, 34 can be useful for improving security for some use cases, to enable software operating in the Realm or Secure state to be isolated (by a hardware-enforced mechanism) from untrusted software running in the Non-Secure state. However, nevertheless a given architectural PAS (e.g. the realm PAS) may support a number of pieces of software which are mutually distrusting and which may not trust the operating system or hypervisor setting the translation table structures 74 to set access permissions to prevent inappropriate access to its data by other software sharing the same architectural PAS 84. Therefore, as shown in
As shown in
Hence, as shown in
While
While
Also, while memory encryption contexts identified by MECID are described here in relation to the scheme with multiple architectural physical address spaces 84 as shown in
Hence, in general the MECID may be an identifier assigned to memory access requests to distinguish between respective memory encryption contexts associated with a given physical address space. A memory encryption/decryption engine 50 may use the MECID to select between different configurations of encryption/decryption operation and/or different key information.
The inventors have recognised that such MECIDs can provide an opportunity for efficiency savings in the snoop filters 36, 40 used for filtering snoop traffic in a data processing system 2. When the underlying data in memory is encrypted according to different encryption regimes selected based on MECID, it is unlikely that a physical address associated with one MECID would be accessed by a memory access request associated with a different MECID. The MECID can therefore be regarded as a proxy for distinguishing the particular software workload (or group of workloads) which will make use of a certain block of physical addresses, and requests issued by other workloads are very unlikely to require access to that block of physical addresses. With a snoop filter structure which tracks, at granularity of MECIDs, which MECIDs are or are not in use for cached data held in the private cache 8 of a particular caching agent 6, 22, this can support filtering of snoops to a much greater extent (for a given amount of circuit area incurred in storing snoop filter information) than would be possible with a structure specifying entries per physical address, since a single MECID-based entry may be used to control filtering of snoops for an associated block of physical addresses accessed using that MECID. Put another way, for a given amount of snoop filter performance (reduction in unnecessary snoop traffic sent to caching agents to snoop addresses which are not actually held by those caching agents), the storage overhead of a MECID-based snoop filter may be much lower than for an address-based snoop filter offering similar levels of performance. MECID-based snoop filtering does not need to replace address-based filtering entirely (although purely MECID-based snoop filters are possible). In some examples, both MECID-based snoop filtering information and address-based snoop filtering information may be used to identify which snoop requests can be suppressed from being sent to corresponding caching agents.
The snoop filtering circuitry 122 uses snoop filtering information 124, 126 to determine whether to transmit a snoop request to a given caching agent in response to the given memory system request, or suppress the snoop request from being sent to the given caching agent. The snoop filtering information may include address-based snoop filtering information 124 looked up based on at least the target physical address of the given memory system request. In systems supporting multiple architectural PASs 84, where the snoop filtering information is located prior to the PoPA 94, the address-based snoop filter information may also be looked up based on the PAS identifier, which may be treated as additional address bits for example. The address-based snoop filter information 124 can be maintained according to any known snoop filter approach, and may track, for at least a subset of physical addresses, which caching agents hold valid cached data for those addresses.
The snoop filtering information also includes MECID-based snoop filtering information 126 which specifies information for distinguishing whether, for a given caching agent, a given MECID should be regarded as a snoop-not-required MECID for which snoop requests are not required to be transmitted to that given caching agent in response to a given memory system request specifying that given MECID. A number of examples of the MECID-based snoop filtering information 126 are described below.
In some examples (e.g. where the snoop filter 40 is provided at a device port 26 that acts as a gateway to a coherent device 22), it is not essential to provide address-based snoop filter information 124 and so snoop filtering may be supported based on MECID but not based on physical address. Other examples (e.g. where the snoop filter 36 at the home node 32 supports MECID-based snoop filtering) may support snoop filtering based on both MECID and physical address. In this case, while
At step 202, the snoop filtering circuitry 122 looks up the MECID-based snoop filtering information 126 based on the target MECID. Based on the target MECID and the MECID-based snoop filtering information 126 associated with a given caching agent, the snoop filtering circuitry 122 determines whether the target MECID is a snoop-not-required MECID for the given caching agent. If the target MECID is a snoop-not-required MECID for the given caching agent then at step 204 the snoop filtering circuitry 122 suppresses transmission of snoop request to the given caching agent. If the target MECID is not a snoop-not-required MECID, then at step 206, whether the snoop request is transmitted to the given caching agent can depend on implementation choice and/or on other snoop filtering information (e.g. the address-based snoop filtering information 124 if provided).
Hence, with this example of tracking information, a given MECID (MECID=x) may be determined to be a snoop-not-required MECID for a given caching agent y if MECID x hits in the snoop filter information 126 (i.e. has a valid entry for which the MECID field 222 specifies MECID x) and the caching agent indicator 224 corresponding to caching agent y indicates that the caching agent y (or group of caching agents comprising caching agent y) does not hold valid data for MECID=x (e.g. in the example of
Hence, in the example of
This approach can save overhead compared to tracking each individual physical address associated with a given MECID in a snoop filter, since a single entry storing an association between a MECID and a given caching agent can suffice for making a snoop forwarding decision for any physical address associated with the MECID. While such MECID based filters may not be precise (there may be a limit to the number of MECIDs for which valid snoop forwarding information can be maintained, given the storage overheads of doing so), they can nevertheless reduce the amount of broadcasting needed for snoops and the circuit area overhead of achieving a given level of snoop broadcast reduction compared to pure-address-based snoop filtering approaches.
If the lookup misses in the address-based snoop filter information 124, then at step 256 the snoop filtering circuitry 122 of the home node snoop filter 36 looks up the MECID-based snoop filtering information 126, and if there is a hit at step 258 identifies based on the valid MECID tracking entry 220 corresponding to MECID=X which caching agents are the target caching agents to which snoop requests should be sent. Snoops can be suppressed from being sent to those caching agents not indicated as holding valid data for MECID=x (e.g. the caching agents corresponding to indicators 224 set to 0 in the example of
If the lookup for MECID=X misses in the MECID-based snoop filtering information 126 (there is no valid entry corresponding to MECID=X), then at step 260 snoop requests are broadcast to all caching agents, since there is no information available on whether any caching agents hold information for PA=Y or MECID=X.
At step 274, the snoop filtering circuitry 122 looks up the given MECID=X in the MECID-based snoop filtering information 126. If a miss is detected (there is not yet a valid MECID tracking entry 220 corresponding to MECID=X), then at step 276 victim entry is selected for reallocation to MECID=X. If an invalid MECID tracking entry 220 is available, the invalid entry can be selected as the victim entry in preference to valid entries. If there is no invalid entry 220 available, a replacement policy can be used to select which valid entry should be the victim entry (e.g. a round robin or least recently used policy may be used). The contents of the victim entry are discarded. There is no need to trigger any invalidations of information associated with the victim MECID from caching agents because the default approach taken at step 260 of
Regardless of whether the MECID=X hit or missed in the MECID-based snoop filter information 126, at step 280 the entry corresponding to MECID=X is updated to specify that caching agent Z holds valid data for MECID=X. If the entry for MECID=X has only just been allocated at step 278, all indicators 224 other than the indicator corresponding to caching agent Z may be set to indicate that these caching agents do not hold valid data for MECID=X.
Being able to restrict the snoop traffic to a subset of a valid agents is difficult to achieve when there is no precise snoop filter entry, and providing a precise address-based snoop filter may require an unacceptably high circuit area cost in providing sufficient tracking entries to be able to track allocation of each physical address within the cache 24 of the device 22.
However, if snoop requests are tagged with the MECID of the original requester of data then this information can be leveraged by a “gateway” that is located at the boundary port 26 between the host system 4 and other coherent agents 22, so that the snoop filter 40 at the port 26 can make an efficient access control decision that satisfies the security requirements. Specifically, access control lists can be built at the gateway to enforce the security policies for snoop traffic without having to track cached data at physical address granularity at the gateway 26.
Hence,
Hence, the given memory system request received at the request receiving circuitry 120 of the root port's snoop filter 40 may be a snoop request transmitted from the home node 32 to the root port 26 in response to a read/write request from another requester. For example, in
There can be a number of ways in which precise MECID-based snoop filter information can be maintained at the root port snoop filter 40. Unlike in the home node example 32, as the root port's snoop filter 40 is only tracking MECIDs in use for a specific device 22 or group of devices 22, rather than all caching agents, it becomes practical to maintain precise snoop filter information tracking each MECID which has valid cached data at the corresponding device 22 or group of devices.
As shown in
In some examples, the structure of
Alternatively, the snoop filter 40 at the gateway can autonomously manage a list of all MECIDs that might have data cached at the device. Such list can be managed by hardware tracking memory accesses made by the device and extracting the MECID that the access was tagged with. The gateway can track individual entries (or groups of entries) in the private cache 24 of the device 22 that are associated with a given MECID using a reference counter 326 which counts the number of cache lines allocated for that MECID in the “Unique” coherency state to the private cache 24 of the coherent device 22. The counter 326 can be adjusted in one direction (e.g. incremented) when an arbitrary cache line (for any physical address) associated with the MECID is allocated in the Unique state and adjusted in the other direction (e.g. decremented) when an arbitrary line associated with the MECID is Invalidated or made “Shared”. This recognises that in a coherency protocol where any dirty data is by definition in the “Unique” state, there is no need to snoop cache lines that are in a “Shared” state. In other coherency protocols which permit a “Shared and Dirty” state, the counter 326 could instead track all valid lines associated with the corresponding MECID, not just cache lines in the Unique state. If a hardware-maintained cache line counter 326 is provided, then the valid information 322 may not need to be recorded separately from the cache line counter 326 as an entry 320 with the corresponding counter 326 indicating there are no cache lines associated with the corresponding MECID 324 in the private cache 24 of the device 22 can be considered to be invalid.
As shown in
Having selected either an invalid/zero-cache-line-indicating tracking entry at step 374 or a victim entry following an eviction at steps 376, 378, at step 380 the selected MECID tracking entry 320 is allocated for the specified MECID that was specified by the request detected at step 370 (e.g. by updating the MECID field 322 of that entry). At step 382 the cache line counter for that entry is adjusted in a first direction (e.g. incremented). If at step 372 the specified MECID already had a valid entry 320, then similarly at step 382 the cache line counter for that entry is adjusted in a first direction (e.g. incremented).
As shown in
Hence, with this approach, the hardware can track the occupancy of the private cache 24 of the device 22, to track which MECIDs may require snooping, and which MECIDs are snoop-not-required MECIDs for which there cannot be any valid data in the cache 24 of the coherent device 22, so that snoops can be suppressed to provide both a security and performance benefit.
Concepts described herein may be embodied in a system comprising at least one packaged chip. The apparatus described earlier is implemented in the at least one packaged chip (either being implemented in one specific chip of the system, or distributed over more than one packaged chip). The at least one packaged chip is assembled on a board with at least one system component. A chip-containing product may comprise the system assembled on a further board with at least one other product component. The system or the chip-containing product may be assembled into a housing or onto a structural support (such as a frame or blade).
As shown in
In some examples, a collection of chiplets (i.e. small modular chips with particular functionality) may itself be referred to as a chip. A chiplet may be packaged individually in a semiconductor package and/or together with other chiplets into a multi-chiplet semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chiplet product comprising two or more vertically stacked integrated circuit layers).
The one or more packaged chips 400 are assembled on a board 402 together with at least one system component 404 to provide a system 406. For example, the board may comprise a printed circuit board. The board substrate may be made of any of a variety of materials, e.g. plastic, glass, ceramic, or a flexible substrate material such as paper, plastic or textile material. The at least one system component 404 comprise one or more external components which are not part of the one or more packaged chip(s) 400. For example, the at least one system component 404 could include, for example, any one or more of the following: another packaged chip (e.g. provided by a different manufacturer or produced on a different process node), an interface module, a resistor, a capacitor, an inductor, a transformer, a diode, a transistor and/or a sensor.
A chip-containing product 416 is manufactured comprising the system 406 (including the board 402, the one or more chips 400 and the at least one system component 404) and one or more product components 412. The product components 412 comprise one or more further components which are not part of the system 406. As a non-exhaustive list of examples, the one or more product components 412 could include a user input/output device such as a keypad, touch screen, microphone, loudspeaker, display screen, haptic device, etc.; a wireless communication transmitter/receiver; a sensor; an actuator for actuating mechanical motion; a thermal control device; a further packaged chip; an interface module; a resistor; a capacitor; an inductor; a transformer; a diode; and/or a transistor. The system 406 and one or more product components 412 may be assembled on to a further board 414.
The board 402 or the further board 414 may be provided on or within a device housing or other structural support (e.g. a frame or blade) to provide a product which can be handled by a user and/or is intended for operational use by a person or company.
The system 406 or the chip-containing product 416 may be at least one of: an end-user product, a machine, a medical device, a computing or telecommunications infrastructure product, or an automation control system. For example, as a non-exhaustive list of examples, the chip-containing product could be any of the following: a telecommunications device, a mobile phone, a tablet, a laptop, a computer, a server (e.g. a rack server or blade server), an infrastructure device, networking equipment, a vehicle or other automotive product, industrial machinery, consumer device, smart card, credit card, smart glasses, avionics device, robotics device, camera, television, smart television, DVD players, set top box, wearable device, domestic appliance, smart meter, medical device, heating/lighting control device, sensor, and/or a control system for controlling public infrastructure equipment such as smart motorway or traffic lights.
Concepts described herein may be embodied in computer-readable code for fabrication of an apparatus that embodies the described concepts. For example, the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts. The above computer-readable code may additionally or alternatively enable the definition, modelling, simulation, verification and/or testing of an apparatus embodying the concepts described herein.
For example, the computer-readable code for fabrication of an apparatus embodying the concepts described herein can be embodied in code defining a hardware description language (HDL) representation of the concepts. For example, the code may define a register-transfer-level (RTL) abstraction of one or more logic circuits for defining an apparatus embodying the concepts. The code may define a HDL representation of the one or more logic circuits embodying the apparatus in Verilog, SystemVerilog, Chisel, or VHDL (Very High-Speed Integrated Circuit Hardware Description Language) as well as intermediate representations such as FIRRTL. Computer-readable code may provide definitions embodying the concept using system-level modelling languages such as SystemC and SystemVerilog or other behavioural representations of the concepts that can be interpreted by a computer to enable simulation, functional and/or formal verification, and testing of the concepts.
Additionally or alternatively, the computer-readable code may define a low-level description of integrated circuit components that embody concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII. The one or more netlists or other computer-readable representation of integrated circuit components may be generated by applying one or more logic synthesis processes to an RTL representation to generate definitions for use in fabrication of an apparatus embodying the invention. Alternatively or additionally, the one or more logic synthesis processes can generate from the computer-readable code a bitstream to be loaded into a field programmable gate array (FPGA) to configure the FPGA to embody the described concepts. The FPGA may be deployed for the purposes of verification and test of the concepts prior to fabrication in an integrated circuit or the FPGA may be deployed in a product directly.
The computer-readable code may comprise a mix of code representations for fabrication of an apparatus, for example including a mix of one or more of an RTL representation, a netlist representation, or another computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus embodying the invention. Alternatively or additionally, the concept may be defined in a combination of a computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus and computer-readable code defining instructions which are to be executed by the defined apparatus once fabricated.
Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc. An integrated circuit fabricated using the computer-readable code may comprise components such as one or more of a central processing unit, graphics processing unit, neural processing unit, digital signal processor or other components that individually or collectively embody the concept.
In the present application, the words “configured to . . . ” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
In the present application, lists of features preceded with the phrase “at least one of” mean that any one or more of those features can be provided either individually or in combination. For example, “at least one of: [A], [B] and [C]” encompasses any of the following options: A alone (without B or C), B alone (without A or C), C alone (without A or B), A and B in combination (without C), A and C in combination (without B), B and C in combination (without A), or A, B and C in combination.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2402746.8 | Feb 2022 | GB | national |