In modern multi-processor systems that employ inclusive cache systems, processor cache memories often maintain multiple copies of data. In an inclusive cache system, when one processor alters one copy of the data, it is necessary to update or invalidate all other copies of the data which may appear elsewhere in the multi-processor system. Thus, in a system that employs an inclusive cache, to ensure coherency among the multiple copies of the data, every valid write to one copy of the data must update or invalidate all other copies of the data.
A non-inclusive cache is typically implemented as a shared cache in a multi-level cache hierarchy with at least one cache of the multi-level cache hierarchy at a lower level (e.g., closer proximity to an associated processor) than the non-inclusive shared cache. Additionally, the lower level cache can have at least one coherent copy of cache line data that is not stored in the non-inclusive shared cache.
The cache agent 8 can include a controller 18 that processes requests received from one of the N processors 4 or the other nodes 16. As one example, a request could be a request to read data from the shared cache memory 10 (e.g., a “read request”). The read request could be issued by one of the N processors 4 (e.g., a requesting processor) when the one processor 4 receives a cache miss during an attempt to read or write data at the one processor's 4 respective local cache 6. The controller 1S could allow the read request to access the shared cache memory 10 if a level of activity of the cache agent 8 and/or the shared cache memory 12 is below a predetermined level. The level of activity could be determined, for example, by examining the size of read and/or write queues to the shared cache memory 10. It is to be appreciated that a variety of methodologies and/or systems could be employed to determine the level of activity. The predetermined level of activity could be programmable, and could vary based on the type of requests for access to the shared cache memory 10. The types of requests could be implemented, for example as a read request a write request or a snoop to name a few. If the shared cache memory 10 contains a coherent copy of the requested data, the cache agent 8 forwards the coherent copy of the requested data to the requesting processor 4. However, if the level of activity of the cache agent 8 and/or the shared cache memory 10 is at or above the predetermined level of activity, the read request could be denied, such that, the cache agent 8 forwards the read request to the system memory 14 and/or the other nodes 16 via the system interconnect 12.
Additionally, the read request can be forwarded to the system memory 14 and/or the other nodes 16 if the shared cache memory does not contain a coherent copy of the requested data. In response to the forwarded read request typically a response to the read request is issued to the cache agent 8 via the system interconnect 12, which response can be forwarded to the requesting processor 4 by the cache agent 8.
In another example, the request for access could be implemented as a writeback of data (writeback data) associated with one more cache lines of a processor's local cache 6. A writeback request could be performed by one of the N processors 4 (“the writing processor” 4), for example, during a cache line eviction procedure. The controller 18 could allow the writeback request to access the shared cache memory 10 if the cache agent 8 and/or the shared cache memory 10 are below the predetermined level of activity discussed above. If the shared cache memory 10 allows the writeback request to access the shared cache memory 10, the controller 18 can allocate one or more cache lines of the shared cache memory 10 for storage of the writeback data and the cache agent 8 writes the writeback data to the shared cache memory 10. The cache agent 8 also provides the writing processor 4 with a completion signal indicating that the writeback is complete. It is to be understood that the completion signal could be provided to the writing processor 4 before the writeback, data is written to the shared cache memory 10.
Alternatively, if the level of activity is at or above the predetermined level, the cache agent 8 can deny the writeback request. If the writeback request is denied, the cache agent 8 typically forwards the writeback request to the system memory 14 via the system interconnect 12. Typically, the system memory 14 can respond to the writeback request by providing a system memory completion signal to the cache agent 8, and the cache agent 8 can provide the completion signal discussed above to the writing processor 4. It is to be understood that the completion signal could be provided to the writing processor 4 before the system memory completion signal is provided to the cache agent 8.
In yet another example the request for access, could be implemented as a snoop for data at the cache (e.g., a “snoop”). The snoop could be issued by one of the other nodes 16 via the system interconnect 12. The controller 18 could allow the snoop to access the shared cache memory 10 if the cache agent 8 and/or the shared cache memory 10 are below the predetermined level of activity discussed above. If the shared cache memory 10 contains a coherent copy of the requested data, and the coherent copy is exclusively owned by the shared cache memory 10, the cache agent 8 can forward the coherent copy of the requested data to the other node 16 via the system interconnect 12. However, if the level of activity of the cache agent 8 and/or the shared cache memory 10 is at or above the predetermined level of activity, and if the local caches 6 associated with the N processors 4 do not contain a coherent copy of the requested data, the snoop could be denied access, such that the cache agent 8 responds to the snoop with an invalid signal indicating that the shared cache memory 10 does not contain a coherent copy of the requested data.
The shared cache agent 54 can include a processor interlace 60 that processes read and write requests from N processors 62, where N is an integer greater than or equal to one. The N processors 62 could be implemented on one or more integrated circuit chips. Each of the N processors 62 can include at least one local cache 64. The local cache 64 associated with each processor 62 can be closer in proximity to the associated processor 62 than the shared cache memory 56. Additionally, the local caches 64 could include one or more cache lines of data not stored in the non-inclusive cache system 52. The processor interface 60 can include a shared access pipeline (S.A.P.) 66 that controls operations performed by the shared cache agent 54. The processor interface 60 can also access a plurality of shared cache tags 68 that can identify a coherency state of one or more (typically four) cache lines, a shared cache memory address location as well as the owner or owners (e.g., a cache 64 associated with, one or more of the processors 62 or the shared cache memory 56) of the respective cache line. As an example, the shared cache tags 68 can employ a modified, exclusive, shared and invalid (MESI) coherency protocol that can indicate, if a particular cache line is: exclusively owned and unmodified (‘E’ state), exclusively owned, unmodified and not stored by any of the caches 64 (e.g., uncached) associated with the N processors 62 (‘Eu’ state) exclusively owned and modified (‘M’ state), exclusively owned, modified and not stored by any of the caches 64 (e.g., uncached) associated with the N processors 62 (‘Mu’ state), shared by at least two different caches (‘S’ state) or Invalid (‘I’ state).
The processor interface 60 can also communicate with a system interlace 70. The system interface 70 can process, such as through a plurality of buffers, data transfers on a system interconnect 72 between the shared cache memory 56 and/or one or more of the N processors 62 and a system memory 74 or other nodes 76. The system memory 74 could be implemented, for example, as static RAM or dynamic RAM. The other nodes 76 could be implemented, for example, as additional processors that could also include one or more associated caches.
At a DECISION POINT, a determination (based on a condition discussed below) is made by the shared access pipeline 66 if the shared cache memory 56 stores a coherent copy of the data (SHARED CACHE BIT). If the determination is positive, data requested by the READ MISS can be read (DATA READ) from the shared cache memory 56 and forwarded to the processor 62 (CACHE DATA).
Alternatively, if the determination is negative at the DECISION POINT, the processor interface 60 can issue a SYSTEM INTERFACE READ MISS at the system interconnect 72, even if the shared cache tag returned to the processor interface 60 indicates that the shared cache memory 56 contains a coherent copy of the data (e.g., SHARED CACHE HIT). The processor interface 60 then typically receives a response to the SYSTEM INTERFACE READ MISS that includes a coherent copy of the data requested by the processor 62 (SYSTEM INTERFACE DATA). The SYSTEM INTERFACE DATA can be forwarded to the processor 62 (DATA RETURN) by the processor interface 60.
The determination at the DECISION POINT can be made based on at least the following condition. The condition could be based on an examination of a current level of activity of the non-inclusive cache system 52. The level of activity of the non-inclusive cache system 52 can be determined, for example, by the shared access pipeline 66 examining the size of one or more I/O data queues at the shared cache memory controller 58, the shared cache memory 56 and/or the system interface 70. If the level of activity of the non-inclusive cache system 52 is determined to be below a programmable threshold value, the condition could be positive, hut if the level of activity is determined to be at or above the programmable threshold value, the condition could be negative. If the condition is negative, the determination, at the DECISION POINT can be negative. Alternatively, if the condition is positive, the determination at the DECISION POINT can be positive. It will be appreciated by those skilled in the art that other combinations of conditions, and/or other conditions can be considered.
When the shared cache tag is returned to the shared access pipeline 66 (TAG), the shared access pipeline 66 identifies the coherency state of the cache line, and identifies the owner(s) of the cache line. If the shared cache tag returned to the shared access pipeline 66 indicates that the shared cache memory 56 does not contain a coherent copy of the data (SHARED CACHE MISS), the processor interface 60 issues a SYSTEM INTERFACE READ MISS on the system interconnect 72. The processor interlace 60 typically receives a response to the SYSTEM INTERFACE READ MISS via the system, interface 70 that includes a coherent copy of the data requested by the processor 62 (SYSTEM INTERFACE DATA). The processor interface 60 can forward the coherent copy of the data to the processor 62 (DATA RETURN).
At a DECISION POINT 1 (e.g., an early decision point), a determination by the shared access pipeline 66 (based on conditions discussed below) is made if the shared cache memory 56 does not store a coherent copy of the data, requested by the READ MISS, if the determination is positive, a block of memory (e.g., one or more cache lines) in the shared cache memory 56 can be allocated, for the data requested by the READ MISS. If the determination at the DECISION POINT 1 is negative, space (e.g., one or more cache lines) in the shared cache memory 56 is not allocated for the data requested by the READ MISS.
The determination at DECISION POINT 1 can be made on at least one of the following conditions. As an example, the first condition for the DECISION POINT 1 could be based on an examination of a current level of activity of the non-inclusive cache system 52. The level of activity of the non-Inclusive cache system 52 can be determined, for example, by the shared access pipeline 66 examining the size of one or more I/O data queues at the shared cache, memory controller 58, the shared cache memory 56 and/or the system interlace 70. If the level of activity of the non-inclusive cache system 52 is determined to be below a programmable threshold value, the first condition could be positive, but if the level of activity is determined to be at or above the programmable threshold value, the first condition could be negative.
The second condition for the DECISION POINT 1 can be based on the type of request associated with the READ MISS. For example, the second condition could be based on whether the READ MISS is based on a store miss or a load miss. If the READ MISS is based on a load miss, the second condition, could be positive. If the READ MISS is based on a store miss, the second condition could be negative.
The third condition for the DECISION POINT 1 can be based on the state of the shared cache memory 56. As an example, certain cache lines in the shared cache memory 56 can be inoperative (e.g., faulty). If there are a sufficient number of usable cache lines in the shared cache memory 56 to allocate the block of memory in the shared cache memory 56, the third condition can be positive. The number of usable cache lines can be, for example, the number of operable (e.g., not faulty) cache lines in the shared cache memory 56 with either an invalid state (‘I’ state) or an indication that the cache line data stored at the particular cache line has not been recently used. The recentness of use of a cache line can be indicated, for example, by a field of a shared cache tag associated with the cache line. Conversely, if there are an insufficient number of usable cache lines in the shared cache memory 56 to allocate the block of memory in the shared cache memory 56, the third condition can be negative.
The fourth condition for the DECISION POINT 1 can be based on the shared access pipeline 66 examining whether the non-inclusive shared cache system 52 has a currently pending request for the data requested by the READ MISS from a remote processor, such as one of die other nodes 76 (e.g., by a system snoop). If such a request is pending, the fourth condition, can be negative. If such a request is not pending, the fourth condition can be positive.
The fifth condition for the DECISION POINT 1 can be based on a replacement coherency state for the cache lines that could be allocated for the block of memory. For example, if one or more of the cache lines in the shared cache memory 56 that would be allocated for the block of memory, requires eviction buffer resources, the eviction buffer can be examined by the shared access pipeline 66 and/or the shared cache memory controller 58. If the eviction buffer is not full, the fifth condition can be positive. Conversely, if the eviction buffer is full, the fifth condition can be negative. Alternatively, if the cache lines that would be allocated for the block of memory do not require eviction buffer resources, the data requested by the READ MISS could be written to the shared cache memory 56 without an eviction, and thus, the fifth condition could be positive. If one or more of the conditions discussed above is negative, the determination, at DECISION POINT 1 can be negative. Alternatively, if all of the conditions discussed above are positive, the determination at DECISION POINT 1 can be positive. It will be appreciated by those skilled in the art that other combinations of the conditions, and/or other conditions can be considered.
At a DECISION POINT 2 (e.g., a late decision point), a determination can be made by the shared access pipeline 66 (based on one or more conditions discussed below) if the determination made at the DECISION POINT 1 was positive, such that the determination indicated that the memory block in the shared cache memory 56 should be allocated for the data requested by the READ MISS. At the DECISION POINT 2 the allocation for the memory block can be aborted if the determination made at the DECISION POINT 2 is negative. Conversely, if the determination made at the DECISION POINT 2 is positive, the allocation for the memory block is not aborted.
The first condition for the DECISION POINT 2 can be based on the shared, access pipeline 66 examining the origin of data provided by the SYSTEM INTERFACE DATA. For Instance, the SYSTEM INTERFACE DATA signal could include an identifier, such as a packet identifier, that indicates the provider of die data, provided with the SYSTEM INTERFACE DATA signal. As an example, the identifier could indicate if the data provided with the SYSTEM INTERFACE DATA signal was provided from the system memory 74 or a cache associated with one of the other nodes 76. In such a situation, the shared access pipeline 66 could be configured to make the first condition positive if the data provided with the SYSTEM INTERFACE DATA signal was provided from the system memory 74. Conversely, the shared access pipeline 66 could be configured to make the first condition negative if the data provided with the SYSTEM INTERFACE DATA signal was provided, from a cache associated with one of the other nodes 76.
The second condition for DECISION POINT 2 can be based on a current level of activity of the non-inclusive cache system 52. The level of activity of the non-inclusive shared cache system 52 could be determined by the shared access pipeline 66, in a manner described above. If the level of activity of the shared cache system 52 is below the programmable threshold value, the second condition can be positive. If the level of activity of the shared cache system 52 is at or above the programmable threshold value, the second condition can be negative. If one or more of the conditions considered at DECISION POINT 2 is negative, the determination at DECISION POINT 2 can be negative. Alternatively, if all of the conditions considered at DECISION POINT 2 are positive, the determination at DECISION POINT 2 can be positive, it will be appreciated by those skilled in the art that other combinations of conditions, and/or other conditions can be considered.
If the determination at both the DECISION POINT 1 and the DECISION POINT 2 are positive the processor interface 60 writes the data returned by the SYSTEM INTERFACE DATA to the shared cache memory 56 (DATA), and updates a tag of the shared cache tags 68 associated with the data requested by the READ MISS (TAG WRITE). If the determination at either the DECISION POINT 1 or the DECISION POINT 2 is negative, the processor interlace 60 does not write or allocate space for the data returned by the SYSTEM INTERFACE DATA signal to the shared cache memory 56, and the shared cache tag can be updated accordingly (TAG WRITE).
At a DECISION POINT, a determination by the shared access pipeline 66 (based on conditions discussed below) can be made. If the determination at the DECISION POINT is positive, the shared access pipeline 66 causes the processor interface 60 to issue a completion signal (DONE) to the processor 62, and the processor interface 60 writes the WRITEBACK data to the shared cache memory 56 (DATA), and updates the shared cache tag accordingly (TAG WRITE). It is to be understood that the completion signal (DONE) could be issued to the processor 62 before the WRITEBACK data is written to the shared cache memory 56.
Alternatively, if the determination made at the DECISION POINT is negative, the shared access pipeline 66 causes the processor interface 60 to issue a SYSTEM INTERFACE WRITEBACK of the WRITEBACK data to the system memory 74 via the system interconnect 72, whether or not a space in the shared cache memory 56 has been or can be allocated for the WRITEBACK data (as indicated by SHARED CACHE HIT OR ALLOC). The processor interlace 60 issues a completion signal (DONE) to the processor 62, and the processor interface 60 receives a completion signal via the system interconnect 72 (SYSTEM INTERFACE DONE) indicating that the WRITEBACK data, has been written to the system memory 74. Additionally, the shared cache tag is updated accordingly (TAG WRITE), it is to be understood that the completion signal (DONE) could be issued to the processor 62 before the processor interface 60 receives the SYSTEM INTERFACE DONE signal.
The determination at the DECISION POINT can be made on at least one of the following conditions. As an example, the first condition for the DECISION POINT could be based on an examination of a current level of activity of the non-inclusive cache system 52. The level of activity of the non-inclusive cache system 52 can be determined, for example, by the shared access pipeline 66 examining the size of one or more I/O data queues at the shared cache memory controller 58, the shared cache memory 56 and/or the system interface 70. If the level of activity of the non-inclusive cache system 52 is determined to be below a programmable threshold value, the first condition, could be positive, but if the level of activity is determined to be at or above the programmable threshold value, the first condition could be negative.
The second condition for the DECISION POINT cm be based on the state of the shared cache memory 56 as indicated by the SHARED CACHE HIT OR ALLOC signal. As an example, certain cache lines in the shared cache memory 56 can be inoperative (e.g., faulty). If there are a sufficient number of usable cache lines in the shared cache memory 56 to allocate the block of memory in the shared cache memory 56, the second condition can be positive. The number of usable cache hues can be, for example, the number of operable (e.g., not faulty) cache lines in the shared cache memory 56 with either an Invalid state (‘I’ state) or an indication, that the cache line data stored at the particular cache line has not been recently used. The recentness of use of a cache line can be indicated, for example, by a field of a shared cache tag associated with the cache line. Conversely, if there are an insufficient number of usable cache lines in the shared cache memory 56 to allocate the block of memory in the shared cache memory 56, the second condition can be negative.
The third condition for the DECISION POINT can be based on a replacement coherency state for the cache lines that could be allocated for the block of memory. For example, if one or more of the cache lines in the shared, cache memory 56 that would be allocated for the block of memory requires eviction buffer resources, the eviction buffer can be examined by the shared access pipeline 66 and/or the shared cache memory controller 58. If the eviction buffer is not full, the third condition can be positive. Conversely, if the eviction, buffer is full, the third condition can be negative. Alternatively, if each of the cache lines mat would be allocated for the block of memory do not require eviction buffer resources, the WRITEBACK data, could, be written to the shared cache memory 56 without an eviction, and thus, the third condition could be positive. If one or more of the conditions discussed above is negative, the determination at the DECISION POINT can be negative. Alternatively, if all of the conditions discussed above are positive, the determination at the DECISION POINT can be positive. It will be appreciated by those skilled in the art that other combinations of conditions, and/or other conditions can be considered.
At a DECISION POINT, a determination by the shared access pipeline 66 can be made. The determination at the DECISION POINT can be based on information provided by the shared cache tag returned to the shared access pipeline 66 (TAG) and a current level of activity of the non-Inclusive cache system 52. The level of activity can be determined by, for example, the shared access pipeline 66 examining the size of one or more I/O date queues at die shared cache memory controller 58, the shared cache memory 56 and/or dm system interlace 70.
At the DECISION POINT, if the shared cache tag returned to the shared access pipeline 66 (TAG) indicates that the shared cache memory 56 contains an exclusively owned and unmodified coherent copy of the data requested by the SNOOP (DATA STATUS), and the shared cache tag indicates that none of the processor caches 64 contain a coherent copy of the data (‘Eu’ state), and the shared cache memory system 52 is below a programmable level of activity, the following process can occur. The processor interface 60 can read the cache line from the shared cached memory 56 (DATA READ), and the shared cache memory 56 can respond with the CACHE DATA. The CACHE DATA can be provided to the source node on the system interconnect 72 (DATA) and the processor interface 60 can provide a SNOOP RESPONSE that indicates a cache hit at the non-inclusive shared cache system 52. The shared cache tag can be updated (TAG WRITE) to invalidate the cache line associated with the shared cache tag returned to the shared access pipeline 66.
Alternatively, if the shared cache tag returned to the shared access pipeline 66 (TAG) indicates that the shared cache memory 56 contains an exclusively owned and unmodified coherent copy of the data (DATA STATUS) and the shared cache tag indicates that none of the processor caches 64 contain a coherent copy of the data (‘Eu’ state), hut the level, of activity of the non-inclusive shared cache memory system 52 is at or above the programmable threshold value, the following process can occur. The shared access pipeline 66 can cause the processor interface 60 to provide a SNOOT RESPONSE to the home node on the system. Interconnect 22 that indicates a cache miss at the non-inclusive shared cache system 52. In such a situation, the shared cache tag can be updated (TAG WRITE) to invalidate the cache line associated with the shared cache tag whether or not the shared cache memory 56 contained a coherent copy of the data requested by the SNOOP.
As another alternative, if the shared cache tag returned to the shared access pipeline 66 (TAG) indicates that the shared cache memory 56 contains an exclusively owned and modified (‘M’ state or ‘Mu’ state) coherent copy of the data (DATA STATUS), the following process can occur. The data requested by the SNOOP can be read from the shared cache memory 56 (DATA READ) and returned to the processor interlace 60 (CACHE DATA). The processor interface 60 can forward the data provided by the CACHE DATA signal to the source node (DATA). The processor interface 60 can provide the home node with a SNOOP RESPONSE that indicates a cache hit at the non-inclusive shared, cache system 52 on the system interconnect 72. Additionally, the shared cache tag can be updated (TAG WRITE) to invalidate the cache line associated with the shared cache tag.
As yet another alternative, if the shared cache tag returned to the shared access pipeline 66 (TAG) indicates that the shared cache memory 56 does not contain a coherent copy of the data (‘I’ state) or indicates that that the shared cache memory 56 has a shared copy of the data (‘S’ state) the following process can occur. The shared access pipeline 66 can cause the processor interface 60 to provide a SNOOP RESPONSE to the home node that indicates a cache miss at the non-inclusive cache system 52 on the system interconnect 72. Additionally, the shared cache tag can be updated accordingly, (TAG WRITE).
In still another alternative, if the shared cache tag returned to the shared access pipeline 66 (TAG) indicates that the shared cache memory 56 contains an exclusively owned unmodified coherent copy of the data (DATA STATUS), and that at least one of the processor caches 64 contain a coherent copy of the data (‘E’ state), the following process can occur. The shared access pipeline 66 can cause the processor interlace 60 to snoop a coherent copy of the data requested by the SNOOP from a processor 62 that is associated with a cache 64 that contains a coherent copy of the data (PROC. SNOOP). The processor 62 can respond to the PROC. SNOOP with the data (if the cache 64 associated with the processor 62 still contains a coherent copy of the data) (PROC. RESPONSE) and art indication of the coherency state of the data, if the PROC. RESPONSE indicates that the data provided is a coherent copy of the data (e.g., an ‘E’ state or an ‘M’ state), the shared access pipeline 66 can cause the processor interface 60 to provide the data provided by the PROC. DATA signal to the source node (data). Additionally, the processor interlace 60 can provide a cache hit for the non-Inclusive shared cache system 52 to the home node on the system interconnect 72 (SNOOP RESPONSE) and update the shared cache tag accordingly (TAG WRITE).
If the PROC. RESPONSE indicates that the cache 64 associated with the processor 62 no longer contains a coherent copy of the data (‘I’ state), and the shared cache memory system 52 is below a programmable level of activity, the following process can occur. The data requested by the SNOOP can be read from, the shared cache memory 56 (DATA READ) and returned to the processor interface 60 (CACHE DATA). The processor interface 60 can forward the data provided by the CACHE DATA, signal to the source node (DATA). The processor interface 60 can provide the home node with a SNOOP RESPONSE that indicates a cache hit at the non-Inclusive shared cache system 52 on the system interconnect 72 and update the shared cache tag accordingly (TAG WRITE).
Conversely, if the PROC. RESPONSE indicates that the cache 64 associated with the processor 62 no longer contains a coherent copy of the data (‘I’ state), but the level of activity of the non-inclusive shared cache memory system 52 is at or above the programmable threshold value, the following process can occur. The shared access pipeline 66 can cause the processor interface 60 to provide a SNOOP RESPONSE to the home node on the system interconnect 72 that indicates a cache miss at the non-inclusive shared cache system 52. In such a situation, the shared, cache tag can be updated (TAG WRITE) to invalidate the cache line associated with the shared cache tag whether or not the shared cache memory 56 contained a coherent copy of the data requested by the SNOOP.
What have been described above are examples of the present invention. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the present invention, but one of ordinary skill in the art will recognize that many further combinations and permutations of the present invention are possible. Accordingly, the present invention is intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims.