Embodiments of the invention relate to techniques for maintaining data coherency. More particularly, embodiments of the invention relate to techniques to reduce snoops used to maintain data coherency, which can result in a reduction in system power consumption.
In a multi-core system (e.g., system on chip, SoC), power management can be a driving design consideration. One way of reducing power consumption is to reduce the operating frequency of one or more components of the system (e.g., a processing core). When all components of the system are running at high frequency, the power consumption rate is higher. Typically, the memory access path is the critical path wen operating at high frequency.
When conserving power by reducing operating frequency, the critical path can become the snoop path. Thus, managing the snoop path efficiently can be important in managing power consumption.
Embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
a illustrates one embodiment of snoop filter entries for a line based snoop filter.
b illustrates one embodiment of snoop filter entries for a region based snoop filter.
a illustrates one embodiment of a line-based snoop filter lookup operation.
b illustrates one embodiment of a region-based snoop filter lookup operation.
a illustrates one embodiment of a line-based snoop filter update operation.
b illustrates one embodiment of a region-based snoop filter update operation.
In the following description, numerous specific details are set forth. However, embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
Snoop filters can be utilized to avoid sending unnecessary snoop traffic and thereby remove snoop traffic from the critical path and reduce the amount of traffic and cache activity in the overall system, which can result in mitigation of performance impact and reduction of power consumption. Described herein are techniques for providing configurable snoop filtering that can be used to provide a more efficient technique for managing snoop traffic to result in improved performance and more efficient power consumption.
The techniques described herein can operate to reduce the amount of traffic generated by snoop requests in a coherent system in a flexible way to provide an improved performance/power consumption balance as compared to previous strategies. The techniques described herein can also reduce the design time of a coherent system by having configurable parameters in both run time and compile time.
Parameters such as, for example, memory size, write back policies, includes/non-inclusive modes of operation, line or regions base, can be configured to adapt filtering to the workloads and/or performance requirements of a coherent system. Bloom filters may also be sued to reduce the amount of memory utilization compared to conventional snoop filtering.
The example of
During normal operation, coherent system fabric 150 routes traffic between system agents. The traffic includes snoops in response to cache requests. As discussed above, snoop requests require power consumption, so a reduction in snoop requests can result in reduced power consumption.
Snoop filter 155 operates to intelligently reduce the number of snoops that are transmitted to the system agents. Various embodiments of snoop filters and operation of the snoop filters are provided in greater detail below. System memory 170 is coupled with the system agents via coherent system fabric 150.
Snooping maintains coherence between caches 114, 124 and 134. If one of cache memories 114, 124 or 134 requests data, coherent system fabric 150 operates to sent snoop requests to the other cache memories. Snoop filter 155 operates to filter the snoop requests so that cache memories that do not have a copy of the requested data do not receive a snoop request.
In one embodiment, snoop filter 155 keeps track of every request received by the coherent system. Snoop filter 155 uses this information to generate snoop requests to the caching agents efficiently. In one embodiment, snoop filter 155 can be configured to operate in one or more of the following five modes: 1) line-based inclusive snoop filtering (ISF); 2) line-based non-inclusive snoop filtering (NSF); 3) region based non-inclusive snoop filtering (RBNSF); 4) region-based non-inclusive snoop filtering with bloom filtering (RBNSF+BF); and 5) bloom filtering.
Snoop replies are stored in reply collector 210 prior to being provided to snoop filter 220. The snoop replies come from various system agents having cache memories. Snoop filter 220 also receives requests from agents. The requests are also provided to snoop request generator 270 to generate snoop requests to the other cache memories in the system.
In one embodiment, snoop request generator 270 generates a snoop request for all caches in the system other than the one generating the request. Snoop filter 220 operates to filter out the snoop requests to caches not having a copy of the requested data by tracking the requests with snoop filter (SF) directory 225 and determining which caches in the system have a copy of the requested data. This filtering can be further augmented by bloom filter 230, which is described in greater detail below.
In one embodiment, output from SF directory 225 and bloom filter 230 is provided to multiplexor 240. In one embodiment, selection by multiplexor 240 is controlled by a hit/miss signal from SF directory 225. The output signal from multiplexor 240 is used to enable one or more snoop requests staged in snoop request buffers 280. This operates to enable snoop requests to only the caches that have a copy of the requested data.
In one embodiment, SF directory 225 is an array organized in set-associative ways where each entry of the way may contain one or more valid bits (depending on configuration—either line or region based), agent status bits and a tag field. In one embodiment, a valid bit indicates whether the corresponding entry has valid information and the agent status bits are used to indicate if the cache line is present in the corresponding caching agent, and the tag field is use to save the tag of the cache line. In one embodiment, the agent status bits are configured as a bit mask for the various cache memories.
In one embodiment, snoop filter directory 225 can be configured at compile time as inclusive line-based, non-inclusive line based or non-inclusive region based.
a illustrates one embodiment of snoop filter entries for a line based snoop filter. In line based mode, each SF directory entry contains a valid bit, agent status bits and a tag for the cache line.
The snoop filter directory provides at least two operations: lookup and update. During lookup, the request address is search in the directory, depending on the lookup result (i.e., hit or miss), the corresponding buffer enable bits are sent to the snoop request buffers (280 in
a illustrates one embodiment of a line-based snoop filter lookup operation. The snoop filter selects the way in which a hit occurs. If the result is a miss snoops are sent to all caches in non-inclusive mode, or the bloom filter result is used.
b illustrates one embodiment of a region-based snoop filter lookup operation. The snoop filter selects the way in which a hit occurs. If the result is a miss snoops are sent to all caches in non-inclusive mode, or the bloom filter result is used.
a illustrates one embodiment of a line-based snoop filter update operation. The snoop filter updates the way where the hit occurs. If the result is a miss the line in invalidated. If there are no valid lines the least recently used (LRU) way is updated.
b illustrates one embodiment of a region-based snoop filter update operation. The snoop filter updates the way where the hit occurs. If the result is a miss the line in invalidated. If there are no valid lines the least recently used (LRU) way is updated.
The counters are used to save statistical information about the number of times that a key address has been generated. Based on the statistical information, the bloom filter generates snoop requests. If the key generated by the requested address points to a counter that is zero, the corresponding caching agent does not have a copy of the cache line. If the key generated by the requested address points to a counter that is non-zero, the corresponding caching agent may have a copy of the cache line.
The bloom filter also provides lookup and update operations.
The snoop filtering techniques and mechanisms described above (including the bloom filter) provide a configurability and flexibility that is not available in the prior art and that provides a more efficient snoop technique the ultimately results in a more resource-efficient system, which provides advantages such as lower power consumption.
The following parameters may be configured at compile/design time:
The following parameters may be configured at run time:
These examples are just one embodiment and other embodiments can have different combinations of parameters that can be configured during compile time, runtime or both based on different implementations.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.