Cache coherency control method, chipset, and multi-processor system

Information

  • Patent Application
  • 20070156972
  • Publication Number
    20070156972
  • Date Filed
    August 31, 2006
    18 years ago
  • Date Published
    July 05, 2007
    17 years ago
Abstract
In a multi-processor system, counting snoop results bottlenecks the broadcast-based snoop protocol. The directory-based protocol delays the latency when remote node caches data. There is a need for shortening the memory access latency using a snoop and cache copy tag information. When the local node's cache copy tag information is available, the memory access latency can be shortened by omitting a process to count snoop results. When memory position information is used to update the cache copy tag during cache replacement, it is possible to increase a ratio to hit a copy tag during reaccess from the local node.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram showing an overall construction of embodiments 1 and 3 according to the invention;



FIG. 2 is a construction diagram of a cache copy tag 210;



FIG. 3 is a construction diagram when the cache copy tag 210 is 4-way set associative;



FIG. 4 is a construction diagram showing the construction of an address 700 for a transaction 500;



FIG. 5 is a table showing categories of a cache state 211;



FIG. 6 is a data construction diagram showing the construction of a transaction 500;



FIG. 7 is a table showing categories of a cache miss request 500;



FIG. 8 is a table showing categories of a cache replace notification 510;



FIG. 9 is a table showing categories of a snoop response 530 and a snoop result 630;



FIG. 10 is a table showing categories of a system transaction;



FIG. 11 is a table showing comparison among respective protocols;



FIG. 12 is a conceptual diagram showing a configuration switch 250 for modification 2 of embodiment 1, and embodiments 2 and 3;



FIG. 13 is a conceptual diagram showing the construction of the cache copy tag 210 modification 3 of embodiment 1 and a modification of embodiment 3;



FIG. 14 is a block diagram showing the overall construction of embodiment 2 according to the invention;



FIG. 15 is a basic flowchart of a cache miss request in embodiment 1;



FIG. 16 is a detailed flowchart for “retrieve local node copy tag” at Step 1100;



FIG. 17 is a detailed flowchart for “update local node copy tag” at Step 1090;



FIG. 18 is a basic flowchart



FIG. 19 is a flowchart showing a cache replacement process according to modification 1 of embodiment 1;



FIG. 20 is a flowchart showing a cache replacement process according to modification 2 of embodiment 1;



FIG. 21 is a flowchart showing a cache replacement process according to modification 3 of embodiment 1;



FIG. 22 is a basic flowchart of a remote node snoop process according to embodiment 1;



FIG. 23 is a detailed flowchart for “update remote node copy tag” at Step 1445;



FIG. 24 is a detailed flowchart for “retrieve remote node copy tag” at Step 1400 in embodiment 3; and



FIG. 25 is a detailed flowchart for “retrieve remote node copy tag” at Step 1400 in a modification of embodiment 3.


Claims
  • 1. A cache coherency control method for a multi-processor system, in which the multi-processor system includes multiple nodes connected to each other via a system connection network,each node includes one or more processors, a node controller, and a memory unit,each memory unit has one of a plurality of main storage portions constituting part of a main storage shared by processors of the plurality of nodes,the processor has a cache to hold data retrieved from the memory unit, andthe node controller has an address tag for data maintained in the cache of the processor, a cache copy tag for maintaining a cache state, and a cache copy tag management unit for managing the cache copy tag,the cache coherency control method comprising:allowing the requesting node controller to broadcast a snoop request to another node controller via the system connection network when accessing a memory unit in accordance with a cache miss request from the processor;allowing the node controller also to retrieve the cache copy tag for the requesting node;allowing the cache copy tag management unit to issue an advanced response notification when the address is registered to a cache copy tag of a local node; andreturning response data returned from the memory unit to the processor without a wait for counting a snoop result for the broadcast snoop request.
  • 2. The cache coherency control method for a multi-processor system according to claim 1, wherein the cache copy tag has at least three cache states such as I indicating no cache, S indicating shared state, and E indicating exclusive state; andwherein, when the processor issues a cache miss request for allowing shared state and retrieving the cache copy tag for the requesting node results in cache state S or E for a relevant address, or when the processor issues a cache miss request for allowing only exclusive state and retrieving the cache copy tag for the requesting node results in cache state E for a relevant address, the cache copy tag is assumed to be hit and response data returned from the memory unit is returned to the processor without wait for counting a snoop result for the broadcast snoop request.
  • 3. The cache coherency control method for a multi-processor system according to claim 2, wherein, when data is excluded from the cache of the processor, there is provided means for identifying to which node a memory unit maintaining the data belongs;wherein, when the memory unit does not belong to a node excluded from the cache, a relevant entry of the cache copy tag is provided with a cache state changed to I; andwherein, when the memory unit belongs to a node excluded from the cache, a relevant entry of the cache copy tag is provided with a cache state unchanged.
  • 4. The cache coherency control method for a multi-processor system according to claim 2, wherein the node controller includes a cache replacement control switch; andwherein, when data is excluded from the cache of the processor, a state of the cache replacement control switch specifies whether or not to change a cache state for a relevant entry of the cache copy tag.
  • 5. The cache coherency control method for a multi-processor system according to claim 2, wherein each entry of the cache copy tag includes not only an address tag and a cache state, but also a replace flag to indicate that data is excluded;wherein, when the processor caches data in accordance with a cache miss request from the processor, the replace flag for the entry is cleared;wherein, when data is excluded from the cache of the processor, the replace flag is set for an entry corresponding to the data in the cache copy tag; andwherein, when the replace flag is set during retrieval of the cache copy tag in accordance with a snoop request from a remote node, cache state I is responded independently of cache states and, when the cache copy tag is retrieved in accordance with a cache miss request from a local node, the cache state is responded even though the replace flag is set.
  • 6. A node controller for connecting a processor bus connecting with one or more processors, a memory unit, and a system connection network, wherein each node is composed of the one or more processor buses, the one or more memory units, and the one or more node controllers;wherein each memory unit has one of a plurality of main storage portions constituting part of a main storage shared by the plurality of processors;wherein the processor includes a cache to maintain data acquired from the memory unit;wherein the node controller includes:a cache copy tag to maintain an address tag and a cache state for data maintained in the cache of the processor; anda cache copy tag management unit to manage the cache copy tag;wherein, when accessing a memory unit in accordance with a cache miss request from the processor, the requesting node controller broadcasts a snoop request to another node controller via the system connection network;wherein the node controller also retrieves the cache copy tag for the requesting node;wherein, when the address is registered to a cache copy tag of a local node, the cache copy tag management unit issues an advanced response notification; andwherein response data returned from the memory unit is returned to the processor without wait for counting a snoop result for the broadcast snoop request.
  • 7. The node controller according to claim 6, wherein the cache copy tag has at least three cache states such as I indicating no cache, S indicating shared state, and E indicating exclusive state; andwherein, when the processor issues a cache miss request for allowing shared state and retrieving a cache copy tag for a requesting node results in cache state S or E for a relevant address, or when the processor issues a cache miss request for allowing only exclusive state and retrieving the cache copy tag for a requesting node results in cache state E for a relevant address, the cache copy tag is assumed to be hit and response data returned from the memory unit is returned to the processor without wait for counting a snoop result for the broadcast snoop request.
  • 8. The node controller according to claim 7, wherein, when data is excluded from a cache of the processor, there is provided means for identifying to which node a memory unit maintaining the data belongs;wherein, when the memory unit does not belong to a node excluded from the cache, a relevant entry of the cache copy tag is provided with a cache state changed to I; andwherein, when the memory unit belongs to a node excluded from the cache, a relevant entry of the cache copy tag is provided with a cache state unchanged.
  • 9. The node controller according to claim 7, wherein the node controller includes a cache replacement control switch; andwherein, when data is excluded from a cache of the processor, a state of the cache replacement control switch specifies whether or not to change a cache state for a relevant entry of the cache copy tag.
  • 10. The node controller according to claim 7, wherein each entry of the cache copy tag includes not only an address tag and a cache state, but also a replace flag to indicate that data is excluded;wherein, when the processor caches data in accordance with a cache miss request from the processor, the replace flag for the entry is cleared;wherein, when data is excluded from the cache of the processor, the replace flag is set for an entry corresponding to the data in the cache copy tag; andwherein, when the replace flag is set during retrieval of the cache copy tag in accordance with a snoop request from a remote node, cache state I is responded independently of cache states and,wherein, when the cache copy tag is retrieved in accordance with a cache miss request from a local node, the cache state is responded even though the replace flag is set.
  • 11. A multi-processor system comprising: one or more processors, one or more memory units, a node controller to connect among them, and a plurality of nodes connected to each other via a system connection unit,wherein each memory unit has one of a plurality of main storage portions constituting part of a main storage shared by processors of the plurality of nodes;wherein the processor issues at least two types of access instructions to the memory unit, i.e., a read instruction capable of allowing a processor cache to share data with another processor cache and a read invalidate instruction capable of sharing another processor cache;wherein each node controller has one or more configuration switches;wherein configuration switch has a snoop filter switch to control a snoop operation for the processor included in the node;wherein the snoop filter switch, when enabled, and when no snoop is needed for a processor on a remote node in response to a snoop transaction issued from the node, issues no snoop to the processor bus and, when disabled, surely issues a snoop to the processor bus;wherein a first node has at least two processors;wherein, when a snoop filter is disabled only for a snoop filter switch of a second node, a first processor belonging to the first node issues a read instruction to an address corresponding to the memory unit for the first node;wherein a snoop transaction is issued to an address corresponding to the processor bus for the second node;wherein data is then responded to a read instruction requested on the processor bus for the first node;wherein a second processor belonging to the first node then issues a read instruction to the same address;wherein, in this case, data in response to the read instruction issued from the second processor is responded to the processor bus for the first node in a shorter time period than required for the read instruction issued from the first processor;wherein the first processor belonging to the first node issues a read instruction to an address corresponding to the memory unit of the second node;wherein a snoop transaction is issued to an address corresponding to the processor bus for the second node;wherein data is then responded to a read instruction requested on the processor bus for the first node;wherein a second processor belonging to the first node then issues a read instruction to the same address; andwherein, in this case, data in response to the read instruction issued from the second processor is responded to the processor bus for the first node in the same time period as required for the read instruction issued from the first processor.
  • 12. The multi-processor system according to claim 11, wherein there is provided a third node in addition to the first and second nodes;wherein a first processor belonging to the first node issues a read invalidate instruction to an address corresponding to a relevant memory unit of the third node;wherein a snoop transaction is issued to an address corresponding to the processor bus for the second node;wherein data is then responded to a read invalidate instruction requested on the processor bus for the first node;wherein a second processor belonging to the second node then issues a read instruction to the same address as the read invalidate instruction;wherein data is responded to a read instruction requested on the processor bus for the second node;wherein a second processor belonging to the first node further issues a read invalidate request to the same address as used for the first processor to issue the instruction; andwherein, in this case, data in response to the read invalidate instruction issued from the second processor is responded to the processor bus for the first node in the same time period as required for the read invalidate instruction issued from the first processor.
  • 13. The multi-processor system according to claim 11, wherein the node controller includes a cache hit control switch as a relevant setup switch;wherein, when first and second nodes are provided, the first node has at least two processors;wherein a snoop filter is disabled only for a snoop filter switch of the second node;wherein a first processor belonging to the first node issues a read instruction to an address corresponding to the memory unit of the first node;wherein a snoop transaction is issued to an address corresponding to the processor bus for the second node;wherein data is then responded to a read instruction requested on the processor bus for the first node;wherein a second processor belonging to the first node then issues a read instruction to the same address;wherein, when the cache hit control switch is enabled, data in response to the read instruction issued from the second processor is responded to the processor bus for the first node in a shorter time period than required for the read instruction issued from the first processor; andwherein, when the cache hit control switch is disabled, data in response to the read instruction issued from the second processor is responded to the processor bus for the first node in the same time period as required for the read instruction issued from the first processor.
  • 14. A cache coherency control method for a multi-processor system having a plurality of nodes that are connected with each other via a system connection unit and each include a processor bus connecting with one or more processors and a node controller connecting with one or more memory units, wherein each node controller has a cache copy tag to maintain address tags and cache states of all addresses cached by a processor belonging to a local node; andwherein the cache coherency control method comprises the steps of:determining whether or not an inter-cache transfer occurs as a result of local node snooping when a cache miss causes a memory access request to be issued to a processor bus;issuing a memory read request to memory corresponding to a relevant address in case of no inter-cache transfer occurred and broadcasting a snoop request to each node;retrieving a relevant cache copy tag for a local node and determining whether or not there is an entry corresponding to the address;determining whether or not a cache state is exclusive when an entry is found;determining whether or not a cache state is shared, when not exclusive, and a requested cache miss permits shared state;skipping a step of awaiting a snoop result from each node in case of presence of the entry and awaiting data returned from memory;awaiting a snoop result form each node;determining whether or not memory contains most recent data in accordance with a snoop result from each node; andawaiting data returned from memory in case of presence of most recent data in memory;awaiting inter-cache transfer data returned from memory in case of absence of most recent data in memory; andreturning returned data to a processor.
  • 15. The cache coherency control method according to claim 14, wherein the node controller has a cache replacement control switch to enable or disable cache replacement; andwherein, when the cache replacement control switch is enabled during replacement from the cache, the cache coherency control method includes the step of invalidating a cache state of an entry corresponding to the cache copy tag.
  • 16. The cache coherency control method according to claim 14 comprising the steps of: determining a memory position in the cache during replacement from the cache; andinvalidating a cache state of an entry corresponding to the cache copy tag when the memory position indicates a remote node.
  • 17. The cache coherency control method according to claim 14, wherein an entry of the cache copy tag includes a replace flag indicating a replaced line; andwherein the cache coherency control method comprises the steps of:clearing the replace flag to ensure an entry in the cache copy tag;setting a replace flag for an entry corresponding to the cache copy tag during replacement from the cache;determining whether or not the cache copy tag contains a requested cache line when a request for cache copy tag retrieval arrives from a remote node;determining whether or not the replace flag is set when the cache copy tag contains a requested cache line;invalidating the cache state when the replace flag is set, clearing the replace flag, and responding unsuccessful retrieval hit; andresponding successful retrieval hit when the replace flag is not set.
Priority Claims (1)
Number Date Country Kind
2006-000028 Jan 2006 JP national