This disclosure relates generally to symmetric multi-processor (SMP) environments, and more specifically to implementing a hot coherency state to a cache coherency protocol for use in a SMP platform.
In general a SMP platform is a multi-processor computer architecture that has two or more identical processors connected to a single shared main memory. A typical SMP platform will rely on a cache coherency protocol to maintain the integrity of data in the platform by identifying certain states that cache lines can have at any point in time. One particular cache coherency protocol used in a SMP platform is the MESI protocol which has four states that can identify a cache line: M—Modified, E—Exclusive, S—Shared, and I—Invalid. A cache line is in the Modified state if it has been modified from the value in main memory by a processor. A cache line is in the Exclusive state if only one processor in the SMP has exclusive ownership to the cache line. A cache line is in the Shared state if it is stored in the caches of more than one processor. A cache line is in the Invalid state if it is not stored in the caches of any of the processors.
The MESI protocol is only one example of a cache coherency protocol available for use in a SMP platform. Other cache coherency protocols include but are not limited to MOSI, MOESI and MESIF protocols. The MOSI protocol has four states that can identify a cache line: M—Modified, O—Owned (i.e., a cache line is in the Owned state if it holds the most recent, correct copy of the data), S—Shared, and I—Invalid. The MOESI protocol has five states that can identify a cache line: M—Modified, O—Owned, E—Exclusive, S—Shared, and I—Invalid. The MESIF protocol has five states that can identify a cache line: M—Modified, E—Exclusive, S—Shared, I—Invalid, and F—Forward (i.e., a cache line is in the Forward state if it holds a copy of data in which copies can be made).
In a typical SMP platform, data transfers and modifications by processors occur very frequently. None of the currently available cache coherency protocols represent cache lines that have been frequently modified, therefore modified cache lines will pass from one processor to the next without any history of the modifications being passed along. As a result, a lot of time is spent writing cache lines to the caches in the processors and to main memory, which increases overall latency of the SMP platform and decreases performance of applications running on the platform.
Therefore, it would be desirable if there was a cache coherency protocol that could accommodate cache lines that have been modified by the processors in the SMP platform. Adding a state to represent modified cache lines would provide some history behind the modifications that could be used to track and manage these cache lines in such a way that improves latency and overall performance.
In one embodiment, there is a method of implementing a hot coherency state to a cache coherency protocol used in a multi-processor computer system having a plurality of processor agents, wherein each of the plurality of processor agents has at least one cache. In this embodiment, the method comprises issuing a snoop request for a cache line with intent to modify from one of the plurality of processor agents; determining whether the requested cache line is stored within the cache of one of the non-issuing plurality of processor agents; ascertaining whether the requested cache line has been read and modified if present in the cache of one of the non-issuing plurality of processor agents; designating the cache line as being in a hot coherency state in response to ascertaining that the cache line has been read and modified; forwarding the cache line in the hot coherency state to the processor agent that issued the snoop request for modification; and storing the modified cache line in the hot coherency state in the cache of the processor agent that modified the cache line to facilitate fast access to the cache line in response to a future request to read with intent to modify.
In another embodiment, there is a computer system that comprises main memory and a plurality of processor agents each having a last level cache and a hot cache. Each processor agent is configured to store cache lines in the last level cache and the hot cache. The hot cache is configured to store cache lines in the hot coherency state, wherein cache lines in the hot coherency state are cache lines that have been read and modified. The hot cache is smaller in size than the last level cache to facilitate fast access to the cache lines in the hot coherency state in response to a future request to read with intent to modify. A bus connects each of the plurality of processor agents to the main memory.
In a third embodiment, there is a computer-readable medium storing computer instructions for implementing a hot coherency state to a cache coherency protocol used in a multi-processor computer system having a plurality of processor agents, wherein each of the plurality of processor agents has at least one cache. In this embodiment, the computer instructions comprises issuing a snoop request for a cache line with intent to modify from one of the plurality of processor agents; determining whether the requested cache line is stored within the cache of one of the non-issuing plurality of processor agents; ascertaining whether the requested cache line has been read and modified if present in the cache of one of the non-issuing plurality of processor agents; designating the cache line as being in a hot coherency state in response to ascertaining that the cache line has been read and modified; forwarding the cache line in the hot coherency state to the processor agent that issued the snoop request for modification; and storing the modified cache line in the hot coherency state in the cache of the processor agent that modified the cache line to facilitate fast access to the cache line in response to a future request to read with intent to modify.
As shown in
In this example, the snoop responses come back from the other processor agents (Processor 1, Processor 2, and Processor 3) 102 clean or being in the Invalid state (i.e., not stored by the caches associated with the processor agents). Processor Agent 0 then accesses the cache line from main memory 110. Processor Agent 0 then modifies the cache line and stores it in its L2 cache 106. These actions are illustrated in
Later in time, Processor Agent 1 sends out a snoop request for that same cache line with RWITM to the other processor agents (Processor 0, Processor 2, and Processor 3). These actions are illustrated in
After Processor Agent 1 receives the hot cache line, it will then modify the cache line and store it in the hot cache 108. These actions are illustrated in
Referring back to the example shown in
It is apparent that there has been provided with this disclosure, an approach for implementing a hot coherency state to a cache coherency protocol used in a symmetric multi-processor environment. While the disclosure has been particularly shown and described in conjunction with a preferred embodiment thereof, it will be appreciated that variations and modifications can be effected by a person of ordinary skill in the art without departing from the scope of the disclosure.