Embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
In the following description, numerous specific details are set forth. However, embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
The four caching nodes (120, 140, 160 and 180) may be any type of system component having a cache memory, for example, a processor. In one embodiment, the caching nodes and node controller may be interconnected via multiple point-to-point links (90, 191, 192, 193, 194, 195, 196, 197, 198, and 199).
In one embodiment, node controller 110 may include snoop filter 112 and processing/control agent 114. Node controller 110 may also include additional circuits and functionality. In one embodiment, node controller 110 may be a gateway for communication beyond the cluster. Node controller 110 may also operate as a proxy home or caching agent for cluster agents, if any. Node controller 110 may also serve as a proxy for the caching agents in the local cluster.
In one embodiment, snoop filter 112 may be a table or other type of tracking mechanism having the ability to track data stored in the caches of cluster 100. Snoop filer 112 may be any type of structure that provides this tracking functionality. As described in greater detail below, snoop filter 112 may allow node controller 110 to direct requests to nodes of cluster 100 rather than requesting data from nodes outside the cluster if snoop filter 112 indicates that the data is available within cluster 100. Various techniques to accomplish this are described herein.
Circumstances may arise where a caching node may have requested data available in one of its caches; yet request the data from other nodes. For example, if caching node 160 requests a block of data, a first operation (e.g., a prefetch) may be to check a second level (L2) cache to determine whether the requested block of data is stored in the cache.
It is possible for the caching node to generate a read request if the data is not in the L2 cache even if the requested block of data is in a different cache level of the caching node. The data may be referred to as “Buried-M” data because the modified (i.e., “M”) data block is buried in the cache structure of the requesting caching node and the resulting condition may referred to as a “Buried HitM” condition. As used herein, “HitM” refers to a condition in which a caching agent responds to a snoop request with a hit to a modified (“M”) line. When an external snoop hits a Buried-M block of data, the extracted data cannot be forwarded to the snoop owner because the snooped node has a request to memory pending. The result of the cache miss and the corresponding read request may be an inefficient use of system resources.
As described herein, the Buried HitM condition may be resolved through use of a conflict message referred to herein as a “RspCnfltOwn” message. In one embodiment, upon receiving a RspCnfltOwn message, node controller 110 may prioritize the request from the sender of the RspCnfltOwn message over all others. That is the caching node with the buried data is selected as the winner from all the conflicting requesters.
In one embodiment, processing/control agent 114 may access snoop filter 112 to determine whether a Buried HitM condition exists. Processing/control agent 114 may provide the functionality of node controller 110 and may be implemented as hardware, software, firmware or any combination thereof.
Specifically, the physical layer may provide communication between two ports over a physical interconnect comprising two uni-directional links. Specifically, one uni-directional link 204 from a first transmit port 250 of a first integrated device to a first receiver port 250 of a second integrated device. Likewise, a second uni-directional link 206 from a first transmit port 250 of the second integrated device to a first receiver port 250 of the first integrated device. However, the claimed subject matter is not limited to two uni-directional links.
Processor 2 may be in need of a block of data that is stored in a cache (e.g., a L3 cache) associated with Processor 2. If a Buried-M condition exists (as illustrated by the “M”by Processor 2, Processor 2 may request the block of data by sending a Data Request message to the node controller and a Snoop Request message to Processor 1. Processor 1 may respond to the Snoop Request message with a Response message to the node controller. The Response message may indicate whether Processor 1 has a copy of the requested data and the state of the data (e.g., Modified, Invalid).
In one embodiment, in response to receiving the Data Request message from Processor 2, the node controller may access the snoop filter to determine whether any node in the cluster has a cached copy of the requested data. In the example of
If the node controller did not have the snoop filter, in response to the Data Request message the node controller would send a data request to the home node corresponding to the requested data. In one embodiment, the home node is the node having non-cache memory corresponding to the requested data. In general, a data request to a home node incurs greater latency than acquiring the requested data from local, cached sources. Thus, if the node controller can determine that the data is available locally and avoid requests to the home node overall system performance may be improved.
In one embodiment, the Dummy Snoop message to Processor 2. The Dummy Snoop message may indicate the node controller as the snoop requester. The Dummy Snoop message may operate to verify that Processor 2 does have a copy of the requested data. In response to the Dummy Snoop message, Processor 2 may transmit a Response Conflict Own (RspCnfltOwn) message to the node controller.
In response to receiving the Response Conflict Own message, the node controller may send a Exclusive Data with Completion (DataE(Dummy)_Cmp) message. This message may give Processor 2 ownership of the requested data a signal completion of the data acquisition cycle started by the Data Request message from Processor 2.
Processor 2 may be in need of a block of data that is stored in a cache (e.g., a L3 cache) associated with Processor 2. If a Buried-M condition exists (as illustrated by the “M” by Processor 2, Processor 2 may request the block of data by sending a Data Request(2) message to the node controller and a Snoop Request(2) message to Processor 1. Before Processor 2 acquires the requested data, Processor 1 may request the same block of data by sending a Data Request(1) message to the node controller and a Snoop Request(1) message to Processor 2.
When the node controller receives the Data Request(2) message the node controller may determine, via the snoop filter, that Processor 2 has a cached copy of the requested data. Similarly, when the node controller receives the Data Request(1) message, the node controller may determine, via the snoop filter, that Processor 2 has a cached copy of the requested data. In response to receiving the Snoop Request(1) message Processor 2 identifies a conflict and sends a RspCnfltOwn message to the node controller. In response to receiving the Snoop Request(2) message Processor 1 also identifies a conflict and sends a Response Conflict (RspCnflt) message to the node controller.
Because of the conflicting requests for the block of data, the node controller may send a DataE Forward (DataE(Dummy)_Fwd) message to Processor 2. This message may give Processor 2 ownership of the requested data and indicate that the data should be forwarded after the data is used. Processor 2 may respond with a Conflict Acknowledge (AckCnflt) message to the node controller. Upon receiving ownership of the requested data Processor 2 may perform the operation(s) for which the block of data was requested.
The node controller may then send Processor 2 a Complete-Forward (Cmp_Fwd) message to indicate completion of the data acquisition cycle started by the Data Request message from Processor 2 and that Processor 2 should forward the data to Processor 1 when finished using the data. Processor 2 may forward the data to Processor 1 with a Data Modified (Data_M) message.
Processor 2 may indicate to the node controller that the requested data has been forwarded with a Response Forward (RspFwd) message. In response to the RspFwd message, the node controller may send a Complete (Cmp) message to Processor 1 to signal completion of the data acquisition cycle started by the Data Request message from Processor 1.
Processor 2 may be in need of a block of data that is stored in a cache (e.g., a L3 cache) associated with Processor 2. If a Buried-M condition exists (as illustrated by the “M” by Processor 2, Processor 2 may request the block of data by sending a Data Request(2) message to the local node controller and a Snoop Request(2) message to Processor 1. Before Processor 2 acquires the requested data, a remote node controller may request the same block of data by sending a Data Request(R) message to the local node.
When the local node controller receives the Data Request(2) message the local node controller may determine, via the snoop filter, that Processor 2 has a cached copy of the requested data. Similarly, when the local node controller receives the Data Request(R) message from the remote node controller, the local node controller may determine, via the snoop filter, that Processor 2 has a cached copy of the requested data. In one embodiment, the local node controller may wait until one or more Snoop Response messages are received before sending a subsequent message related to the Data Request(2) and Data Request(R) messages.
In response to receiving the Snoop Request(R) message Processor 2 may identify a conflict and send a RspCnfltOwn message to the node controller. Processor 1 may respond to the Snoop Request(2) message with a Response message to the local node controller. The Response message may indicate whether or not Processor 1 has a copy of the requested data and the state of the data (e.g., Modified, Invalid).
Because of the conflicting requests for the block of data, the local node controller may send a DataE Forward (DataE(Dummy)_Fwd) message to Processor 2. This message may give Processor 2 ownership of the requested data and indicate that the data should be forwarded after the data is used. Processor 2 may respond with a Conflict Acknowledge (AckCnflt) message to the local node controller. Upon receiving ownership of the requested data Processor 2 may perform the operation(s) for which the block of data was requested.
The local node controller may then send Processor 2 a Complete-Forward (Cmp_Fwd) message to indicate completion of the data acquisition cycle started by the Data Request message from Processor 2 and that Processor 2 should forward the data to the local node controller when finished using the data. Processor 2 may indicate to the node controller that the requested data will be forwarded with a Response Forward (RspFwd) message.
Processor 2 may forward the data to the local node controller with a Data Modified (Data_M) message. When the local node controller receives the forwarded data from Processor 2, the local node controller may send the requested data and a Snoop Response message to the remote node controller. The remote node controller may then send the requested data to the requesting entity.
In one embodiment, each cluster (610, 620, 630, 640) is configured similarly to the cluster of
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.