1. Field of the Invention
This invention relates to the field of data processing systems. More particularly, this invention relates to cache memory hierarchies and snoop filtering circuitry for supporting coherence within such cache memory hierarchies.
2. Description of the Prior Art
It is known to provide data processing systems with cache memory hierarchies. In a non-inclusive mode of operation the cache memory hierarchy operates such that a single copy of a cache line of data is held. This single line of cache data may be held at, for example, the level one (L1), the level two (L2) or the level three (L3), but not at more than one level or in more than one cache within a level. Such non-inclusive operation makes efficient use of storage capacity within the cache hierarchy, but suffers from the disadvantage of slower access to a cache line of data when this is not stored at a location close to the transaction source requesting access to that cache line of data.
Another mode of operation is the inclusive mode. In the inclusive mode of operation a cache line of data may be stored in multiple levels of the cache hierarchy and in multiple caches within a level of the cache hierarchy. This type of operation may provide more rapid access to a given line of cache data by a requesting transaction source, but suffers from the disadvantage of less efficient usage of the storage resources of the cache hierarchy.
It is known to provide snoop filter circuitry that serves to store snoop tag values within an inclusive mode of operation that identify which cache memories are storing copies of a given cache line of data such that snoop requests and accesses may be directed to those local cache memories that are storing the cache line of data targeted. A disadvantage of snoop filter circuitry is the resource consumed in terms of gate count, power, area and the like.
Viewed from one aspect the present invention provides apparatus for processing data comprising: a plurality of transaction sources, each of said plurality of transaction sources having a local cache memory; a shared cache memory coupled to said plurality of transaction sources and configured to operate in a non-inclusive mode, said shared cache memory storing shared cache tag values tracking which cache lines of data are stored in said shared cache memory; and snoop filter circuitry configured to store snoop filter tag values for tracking which cache lines of data are stored in said local cache memories; wherein in response to a transaction request to a target cache line of data having a target tag value:
The present technique provides a data processing apparatus which has a shared cache memory, such as a level 3 cache memory, which operates in a non-inclusive mode. It also has snoop filtering circuitry which stores snoop filter tag values tracking which cache lines of data are stored in the local memories. The snoop filter circuitry need not store snoop filter tag values (although in some cases discussed below it does) for cache lines of data which are held within the shared cache memory as the shared cache tag values may instead be used to identify and locate a target cache line of data to be subject to a transaction request. This saves space within the snoop filter circuitry. The present technique may be considered to provide a system which operates non-inclusively with respect to its storage of data values (i.e. shared cache memory non-inclusively stores cache lines of data) whilst providing tag storage on an inclusive basis (i.e. the tags of all of the cache lines of data present within the shared cache memory and the local cache memories are stored within the shared cache memory and the snoop filter circuitry respectively). This helps to reduce the volume of snoop traffic required as the location and presence of cache lines of data may be determined from the snoop filter circuitry and the shared cache memory.
It is possible that the snoop filter circuitry may simply store snoop filter tag values indicating that a cache line of data is present within one of the local cache memories and require a broadcast to be made to all of those local cache memories. However, some embodiments may store transaction source identifying data within the snoop filter circuitry, this transaction source identifying data identifying which cache lines of data are stored in which of the cache memories. In this way it is more likely that speculative snoops can be avoided as it may be determined from the snoop filter circuitry alone what lines of cache data are stored and in which local cache memories.
The transaction source identifying data may serve to identify only a single local cache memory as this will preserve storage capacity and may be sufficient in a large number of operational circumstances. In other embodiments the transaction source identifying data may identify a proper subset of the transaction sources with a number chosen to balance between the storage space consumed and the efficiency gains achieved. An individual item of transaction source identifying data may also be non-precise and identify a range of transaction sources, e.g. all even numbered transaction source or all transactions sources with numbers within a certain range.
The configuration of the snoop filter circuitry may be such that it is constrained to be strictly inclusive in the sense that it must store a snoop filter tag value for each cache line of data that is stored within a local cache memory. However, it is not necessary that every stored snoop filter tag value need have a corresponding cache line of data stored within a local cache memory as it is possible that cache lines of data may be removed from a local cache memory without this being tracked in the snoop filter circuitry in some uncommon cases.
The snoop filter circuitry and the shared cache memory both compare a target tag value with their stored tag values. If either of these produces a hit, then the transaction request is serviced by the appropriate one of the local cache memories having the matching snoop filtered tag value or the shared cache memory having the matching shared cache tag value.
In some situations the default non-inclusive behaviour of the shared cache memory may be overridden selectively on a cache line by cache line basis such that is possible that a hit could occur in both the snoop filter circuitry and the shared cache memory with both of these then servicing the transaction request.
Operational speed may be increased and control complexity decreased if the shared cache memory and the snoop filter circuitry are configured to perform their compare operations in parallel. In some embodiments the shared cache memory and the snoop filter circuitry may be configured to operate as interlocked pipelines when performing accesses in parallel.
If a hit occurs in neither the shared cache memory nor the snoop filter circuitry, then a transaction to the target cache line of data may be initiated to a main memory. The returned target cache line of data will normally be stored in one of the local cache memories corresponding to the source of the transaction and will not be stored in the shared cache memory (i.e. consistent with the default non-inclusive behaviour of the shared cache memory). If that target cache line of data is subsequently evicted from the local cache memory, then it may be stored within the shared cache memory at that later time.
Control complexity is advantageously reduced when the shared cache memory and the snoop filter circuitry are configured to atomically change a tag value between being stored as a snoop filter tag value in the snoop filter circuitry and being stored as a shared cache tag value in the shared cache memory so as to follow a change in storage location for corresponding cache line of data. Such atomic behaviour has the result that when a change is made then this will be fully completed as a single operation at least as far as the external visibility of any changes made. Thus, a subsequent transaction will not be able to observe a partially completed change in the storage location of a tag value as this could produce erroneous or unpredictable behaviour requiring considerable control complexity to manage or prevent.
Examples of the atomic changes to be the storage location of a tag value are when a cache line is evicted from one of the local cache memories to the shared cache memory or when a cache line of data is recalled from the shared cache memory to one of the local cache memories.
As previously mentioned the shared cache memory may be configured to be controlled to selectively store one or more cache lines of data in an inclusive mode. An example of operations which can cause this switch to inclusive mode storage for a cache line of data are a response to a transaction request having one of one or more predetermined types. As an example, a transaction with a type such that it will read a cache line of data to a local cache memory and not subsequently modify that cache line of data stored in the local cache memory may be such as to trigger/permit inclusive storage of that cache line of data.
In some embodiments the shared cache memory and the snoop circuitry may be configured to store unique status data for each cache line of data associated with one of a plurality of transaction sources with this unique status data indicating whether that cache line of data is stored in a local cache memory of any other of the plurality of transaction sources. Thus, the unique status data indicates whether a cache line of data is uniquely stored on behalf of a single transaction source or is non-uniquely stored on behalf of multiple transaction sources.
In some embodiments the shared cache memory and the snoop filter circuitry may be configured to respond to receipt of a non-modifying read transaction from a given transaction source that hits a cache line of data stored in the shared cache memory by a different transaction source by returning the cache line of data to the given transaction source for storing in a local cache memory of the given transaction source, leaving the cache line of data stored in the shared cache memory and setting the unique status data for the cache line of data in both the shared cache memory and the snoop filter circuitry to indicate that the cache land of data stored associated with a plurality of transaction sources. In this way, the shared cache memory is switched to operating in an inclusive mode in respect of at least the cache line of data for which the non-modifying read transaction was received.
In other embodiments (including in combination with the above) the shared cache memory and the snoop filter circuitry may be configured to respond to receipt of a non-modifying read transaction from a given transaction source that missed in the shared cache memory and hits a cache line of data stored in a local cache memory of a different transaction source by returning the cache line of data to the given transaction source for storing in a local cache memory of the given transaction source, leaving the cache line of data stored in the local cache memory of the different transaction source, storing the cache line of data in the shared cache memory and setting the unique status data in both the shared cache memory and the snoop filter circuitry for the cache line for data to indicate that the cache line data is stored associated with a plurality of transaction sources. Again this switches operation to an inclusive mode for the cache line of data concerned.
When a cache line of data is fetched from a memory in response to a transaction request from one of the plurality of transaction sources, the cache line of data will be stored in the local cache memory of one of the plurality sources and a corresponding snoop filter tag value will be stored within the snoop filter circuitry. If a transaction source identifying data value is also by the snoop filter circuitry, then this may be written at the same time.
It will be appreciated that the plurality of transaction sources can have a variety of different forms. In one form these include one or more processor cores. The local cache memories may similarly have a variety of different forms such as including an L1 cache memory and an L2 cache memory.
The plurality of transaction sources may be conveniently connected together via a ring-based interconnect. Such a ring-based interconnect is efficiently scaled as more transaction sources or more shared cache memories are added to the system.
Viewed from another aspect the present invention provides apparatus for processing data comprising: a plurality of transaction source means for generating transactions, each of said plurality of transaction source means having local cache memory means for storing data; shared cache memory means for storing data, said shared cache memory means being coupled to said plurality of transaction source means and configured to operate in a non-inclusive mode, said shared cache memory means storing shared cache tag values tracking which cache lines of data are stored in said shared cache memory means; and snoop filter means for storing snoop filter tag values for tracking which cache lines of data are stored in said local cache memory means; wherein in response to a transaction request to a target cache line of data having a target tag value:
(i) said shared cache memory means is configured to compare said target tag value with said shared cache tag values to detect if said target cache line of data is stored in said shared cache memory means; and
(ii) said snoop filter means is configured to compare said target tag value with said snoop filter tag values to detect if said target cache line of data is stored in any of said local cache memory means.
Viewed from a further aspect the present invention provides a method of processing data comprising the steps of: generating transactions with a plurality of transaction sources; storing respective data in a local cache memory of each of said plurality of transaction sources; storing data in a shared cache memory coupled to said plurality of transaction sources; operating said shared cache memory in a non-inclusive mode; storing in said shared cache memory shared cache tag values tracking which cache lines of data are stored in said shared cache memory means; storing in snoop filter circuitry snoop filter tag values for tracking which cache lines of data are stored in said local cache memory means; and in response to a transaction request to a target cache line of data having a target tag value:
(i) comparing said target tag value with said shared cache tag values to detect if said target cache line of data is stored in said shared cache memory; and
(ii) comparing said target tag value with said snoop filter tag values to detect if said target cache line of data is stored in any of said local cache memories.
The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.
When a processor core issues a transaction seeking to access a cache line of data, a determination is made as to whether or not this cache line data is stored within the local cache memory associated with that processor core. If there is a miss in the local cache memory, then a transaction is sent via the ring-based interconnect 12 to the snoop filter circuitry 14 and the shared cache memory 16. The snoop filter circuitry 14 and the shared cache memory 16 perform a parallel pipeline-interlocked tag look up of the target tag value associated with the transaction sent on the ring-based interconnect 12. This lookup comprises comparing the target tag value with snoop filter tag values stored within the snoop filter circuitry 14 and with shared cache tag values stored within the shared cache memory 16.
If a hit occurs within the shared cache memory 16, then the transaction is serviced by the shared cache memory 16. If a hit occurs within the snoop filter circuitry 14, then the snoop filter circuitry returns signals confirming that the target cache line of data is stored within one of the local cache memories within the system-on-chip integrated circuit 4 and identifying this local cache memory (using transaction source identifying data stored in association With the snoop filter tag value). If misses occur in both the snoop filter circuitry 14 and the shared cache memory 16, then the memory controller 18 initiates an off-chip memory access to the main memory 6 in respect of the target cache line of data.
The returned target cache line of data from the main memory 6 is stored back in to the local cache memory of the transaction source which requested that target cache line of data and a corresponding snoop filter tag entry is written in to the snoop filter circuitry 14 to identify via this snoop filter tag values the memory location of the cache line of data stored back in to the local cache memory together with a transaction source identifying data value identifying which transaction source has the local cache memory storing that target cache line of data.
The shared cache memory 16 operates predominantly non-inclusively in that the default behaviour is that a cache line of data will either be stored in the shared cache memory 16 or in one of the local cache memories but not in both. This default non-inclusive behaviour can be overridden in certain circumstances. In particular, the transaction type of the transaction seeking a target cache line of data may be identified and if this matches one or more predetermined types then an inclusive mode of storage in respect of that target line of cache data may be triggered. In particular, if the transaction is a read that will not subsequently modify the cache line of data (a read_clean), then if a hit occurs within the shared cache memory 16, then the target cache line of data may be returned to the transaction source which requested it for storage in the local cache memory of that transaction source whilst remaining stored within the shared cache memory 16. A snoop filter tag value will be written in to the snoop filter circuitry 14 to track the presence of the target cache line of data within the transaction source. Thus, the snoop filter tag value and a shared cache tag value will both be tracking the same cache line of data and will indicate its presence in more than one place.
Another circumstance in which the non-inclusive behaviour may be switched to inclusive behaviour for a given cache line of data is if a read transaction that will not subsequently modify a cache line of data is made and that this misses within the shared cache memory 16, but is indicated by the snoop filter circuitry 14 as hitting with one of the local cache memories, then the target cache line of data will be retrieved from the local cache memory in which it is stored and a copy placed in both the shared cache memory 16 and the local cache memory of the transaction source which requested that target cache line of data. In this circumstance the target cache line of data will end up stored within three different places within the system, i.e. within two local cache memories and within the shared cache memory 16. The switch to inclusive mode behaviour in respect of such cache lines may be tracked by the use of unique status data in the form of a unique/non-unique flag stored in respect of each cache line of data being tracked (and stored flag values within the snoop filter circuitry 14 and the shared cache circuitry 16).
When a cache line is being stored in accordance with a non-inclusive mode of operation, then the unique status data will indicate that only a single copy of this cache line of data is stored. When a cache line of data is being held in an inclusive mode of operation, then the unique status data will indicate this.
As discussed above, it is possible to switch from cache lines being stored in the non-exclusive to being stored in the inclusive mode in certain circumstances. When these circumstances arise action D indicates how a tag value is copied between the snoop filter circuitry 14 and the shared cache memory 16 without being removed from its original location. This results in the same tag value being stored as a snoop filter tag value within the snoop filter circuitry 14 and a shared cache tag value within the shared cache memory 16. This is illustrated by action D.
The system-on-chip integrated circuit 4 may support types of operations such as partial power down in which circumstances it may be desirable to flush the contents of a local cache memory up to the shared cache memory 16. Such cache maintenance operations may be performed and are illustrated by action E in
As previously mentioned the operation of the actions A, B, C, D and E illustrated in
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5136700 | Thacker | Aug 1992 | A |
5404489 | Woods et al. | Apr 1995 | A |
5457683 | Robins | Oct 1995 | A |
5524212 | Somani et al. | Jun 1996 | A |
5551001 | Cohen et al. | Aug 1996 | A |
5551005 | Sarangdhar et al. | Aug 1996 | A |
5581725 | Nakayama | Dec 1996 | A |
5581729 | Nishtala et al. | Dec 1996 | A |
5598550 | Shen et al. | Jan 1997 | A |
5813034 | Castle et al. | Sep 1998 | A |
5829034 | Hagersten et al. | Oct 1998 | A |
5852716 | Hagersten et al. | Dec 1998 | A |
5860109 | Hagersten et al. | Jan 1999 | A |
5887138 | Hagersten et al. | Mar 1999 | A |
6052760 | Bauman et al. | Apr 2000 | A |
6061766 | Lynch et al. | May 2000 | A |
6065077 | Fu | May 2000 | A |
6073212 | Hayes et al. | Jun 2000 | A |
6076147 | Lynch et al. | Jun 2000 | A |
6108752 | VanDoren et al. | Aug 2000 | A |
6138218 | Arimilli et al. | Oct 2000 | A |
6272602 | Singhal et al. | Aug 2001 | B1 |
6275909 | Arimilli et al. | Aug 2001 | B1 |
6292872 | Arimilli et al. | Sep 2001 | B1 |
6314489 | Nichols et al. | Nov 2001 | B1 |
6321305 | Arimilli et al. | Nov 2001 | B1 |
6338124 | Arimilli et al. | Jan 2002 | B1 |
6343347 | Arimilli et al. | Jan 2002 | B1 |
6351791 | Freerksen et al. | Feb 2002 | B1 |
6502171 | Arimilli et al. | Dec 2002 | B1 |
6848003 | Arimilli et al. | Jan 2005 | B1 |
8078831 | Wang et al. | Dec 2011 | B2 |
8209489 | Guthrie et al. | Jun 2012 | B2 |
8234451 | Agarwal et al. | Jul 2012 | B1 |
20010014932 | Suganuma | Aug 2001 | A1 |
20010021963 | Cypher | Sep 2001 | A1 |
20020010836 | Barroso et al. | Jan 2002 | A1 |
20020013886 | Higuchi et al. | Jan 2002 | A1 |
20020073281 | Gaither | Jun 2002 | A1 |
20020083243 | Van Huben et al. | Jun 2002 | A1 |
20020154639 | Calvert et al. | Oct 2002 | A1 |
20030005237 | Dhong et al. | Jan 2003 | A1 |
20030009621 | Gruner et al. | Jan 2003 | A1 |
20030014592 | Arimilli et al. | Jan 2003 | A1 |
20030014593 | Arimilli et al. | Jan 2003 | A1 |
20030023814 | Barroso et al. | Jan 2003 | A1 |
20030046356 | Alvarez, II et al. | Mar 2003 | A1 |
20030131200 | Berg et al. | Jul 2003 | A1 |
20040030834 | Sharma | Feb 2004 | A1 |
20040133748 | Yang | Jul 2004 | A1 |
20040133749 | Yang | Jul 2004 | A1 |
20040199727 | Narad | Oct 2004 | A1 |
20040268055 | Landin et al. | Dec 2004 | A1 |
20050021913 | Heller, Jr. | Jan 2005 | A1 |
20050216666 | Sih et al. | Sep 2005 | A1 |
20050240736 | Shaw | Oct 2005 | A1 |
20060179247 | Fields, Jr. et al. | Aug 2006 | A1 |
20060224835 | Blumrich et al. | Oct 2006 | A1 |
20060224839 | Blumrich et al. | Oct 2006 | A1 |
20070156972 | Uehara et al. | Jul 2007 | A1 |
20070180196 | Guthrie et al. | Aug 2007 | A1 |
20070186056 | Saha et al. | Aug 2007 | A1 |
20070204110 | Guthrie et al. | Aug 2007 | A1 |
20070226426 | Clark et al. | Sep 2007 | A1 |
20070226427 | Guthrie et al. | Sep 2007 | A1 |
20070255907 | Zeffer et al. | Nov 2007 | A1 |
20080086601 | Gaither et al. | Apr 2008 | A1 |
20080133843 | Wadhawan et al. | Jun 2008 | A1 |
20080209133 | Ozer et al. | Aug 2008 | A1 |
20080235452 | Kornegay et al. | Sep 2008 | A1 |
20080270708 | Warner et al. | Oct 2008 | A1 |
20080288725 | Moyer et al. | Nov 2008 | A1 |
20080313411 | Sugizaki | Dec 2008 | A1 |
20090024797 | Shen et al. | Jan 2009 | A1 |
20090077329 | Wood et al. | Mar 2009 | A1 |
20090172211 | Perry et al. | Jul 2009 | A1 |
20090172295 | Steiner et al. | Jul 2009 | A1 |
20090300289 | Kurts et al. | Dec 2009 | A1 |
20100042787 | Auernhammer et al. | Feb 2010 | A1 |
20100235576 | Guthrie et al. | Sep 2010 | A1 |
20110055277 | Resch | Mar 2011 | A1 |
20110191543 | Craske et al. | Aug 2011 | A1 |
20110252202 | Heine et al. | Oct 2011 | A1 |
Number | Date | Country |
---|---|---|
10-222423 | Aug 1998 | JP |
Entry |
---|
S.T. Jhang et al, “A New Write-Invalidate Snooping Cache Coherence Protocol for Split Transaction Bus-Based Multiprocessor Systems” IEEE TENCON'93, 1993 pp. 229-232. |
M. Azimi et al, “Design and Analysis of a Hierarchical Snooping Cache Coherence System” Proceedings, Twenty-Sixth Annual Allerton Conference on Communication, Control, and Computing, Sep. 1988, pp. 109-118. |
S. Subha, “A Two-Type Data Cache Model” IEEE International Conference on Electro/Information Technology, Jun. 2009, pp. 476-481. |
M. Blumrich et al, “Exploring the Architecture of a Stream Register-Based Snoop Filter” Transactions on High-Performance Embedded Architectures and Compilers III, Lecture Notes in Computer Science, 2011, vol. 6590/2011, pp. 93-114. |
A. Dash et al, “Energy-Efficient Cache Coherence for Embedded Multi-Processor Systems through Application-Driven Snoop Filtering” Proceedings of the 9th Euromicro Conference on Digital System Design (DSD'06), pp. 79-82. |
A. Patel et al, “Energy Efficient MESI Cache Coherence with Pro-Active Snoop Filtering for Multicore Microprocessors” ISLPED'08, Aug. 2008, pp. 247-252. |
UK Search Report dated Oct. 5, 2012 in GB 1210130.9. |
Number | Date | Country | |
---|---|---|---|
20130042078 A1 | Feb 2013 | US |