This application is based upon and claims the benefit of priority of the prior Japanese Application No. 2008-069377, filed on Mar. 18, 2008 in Japan, the entire contents of which are hereby incorporated by reference.
The present invention relates to a technique for executing a memory access request issued from a central processing unit (CPU) in an information processing device that includes two or more CPUs each having a cache memory.
Generally, a large-scale information processing device that includes CPUs and Input/Output (I/O) devices (a large-scale SMP (Symmetric Multiple Processor) information processing device, for example) has system boards each including CPUs with cache memories, a system controller, and I/O devices, so as to improve the processing capacity.
In such a large-scale information processing device, a control operation is performed to guarantee cache coherency between the system boards (a coherence control operation). Therefore, request broadcasting and snoop result exchanges are performed between the system controllers of the respective system boards (see JP-A 2006-72509 and JP-A 2006-202215, for example).
In a large-scale information processing device, however, the physical distance between the system controllers is longer, as the device size is larger. Where the device structure is expanded, the latency of each memory access becomes longer, and it becomes difficult to improve the performance of the entire information processing device. Also, as a larger number of I/O devices are mounted in the information processing device, the number of snoop requests becomes greater accordingly. As a result, it also becomes difficult to secure reasonable throughput of the broadcast bus and each snoop control unit.
By a known technique developed to counter the above problems, the access latency is shortened by skipping the snoop operation over the system boards and performing a data communication between the CPUs in the local system board, when the data at a target address is present in a cache memory in one of the CPUs mounted in the same system board.
Each of the CPUs 10 to 13 includes multilevel cache memories (two levels in this example). More specifically, the CPU 10 includes a first-level cache memory 10a and a second-level cache memory 10b, and the CPU 11 includes a first-level cache memory 11a and a second-level cache memory 11b. Likewise, the CPU 12 includes a first-level cache memory 12a and a second-level cache memory 12b, and the CPU 13 includes a first-level cache memory 13a and a second-level cache memory 13b.
The system board A further includes a system controller 40-1 that performs communication control on the memories (the first-level cache memories 10a and 11a, the second-level cache memories 10b and 11b, and the main memories 30 and 31 in this example) provided in the system board A. Likewise, the system board B further includes a system controller 40-2 that performs communication control on the memories (the first-level cache memories 12a and 13a, the second-level cache memories 12b and 13b, and the main memories 32 and 33 in this example) provided in the system board B.
With this arrangement, the system controllers 40-1 and 40-2 share the communication control on the memories (the first-level cache memories 10a to 13a, the second-level cache memories 10b to 13b, and the main memories 30 to 33 in this example) provided in the information processing device 100. Also, the system controller 40-1 and the system controller 40-2 have the same structures, except that the system controllers 40-1 and 40-2 perform the communication control on different memories. The system controller 40-1 and the system controller 40-2 are connected in such a manner that the system controllers 40-1 and 40-2 can communicate with each other.
The system controller 40-1 includes a cache TAG 46-1, a request transmission/reception unit 41-1, a local snoop control unit 42-1, a broadcast control unit 43-1, a global snoop control unit 44-1, and a memory access issuing unit 45-1.
The cache TAG 46-1 registers and holds specific address information for identifying cache data present in the cache memories (the first-level cache memories 10a and 11a, and the second-level cache memories 10b and 11b in this example; the same applies hereinafter) under its subject node (the system board A in this example; the same applies hereinafter).
The request transmission/reception unit 41-1 receives a memory access request to access a main memory (or a local memory).
More specifically, in a case where a memory access request is generated from the CPU 10, and the data to be detected in response to the memory access request is not found in the first-level cache memory 10a and the second-level cache memory 10b, the request transmission/reception unit 41-1 receives the memory access request (a read request) from the CPU 10. The request transmission/reception unit 41-1 then transmits the received memory access request to the local snoop control unit 42-1 described below. The request transmission/reception unit 41-1 then receives a global snoop request from the local snoop control unit 42-1 described later, and transmits the global snoop request to the broadcast control unit 43-1 described later. The global snoop request is issued to search all the cache memories (the first-level cache memories 10a to 13a and the second-level cache memories 10b to 13b in the example; the same applies hereinafter) provided in the information processing device 100 for the data to be accessed in response to the memory access request (hereinafter referred to simply as the target data).
The local snoop control unit 42-1 searches the cache memories under its subject node for the target data of the memory access request, and, based on the search result, determines an operation to be performed in response to the memory access request.
More specifically, when receiving the memory access request from the request transmission/reception unit 41-1, the local snoop control unit 42-1 performs an operation in response to the CPUs 10 to 13 that have issued the memory access request, by searching (snooping) the cache TAG 46-1 under its subject node for the access target address information (hereinafter referred to simply as the target address information) for identifying the target data of the memory access request.
In a case where there is a hit for the memory access request in the cache TAG 46-1 under its subject node as a result of the search, for example, the local snoop control unit 42-1 determines an operation in response to the memory access request, based on the search result. The operation to be performed in response to the memory access request is to issue a read request to read data in a main memory, to issue a purge request to a CPU to purge data in a cache memory, or the like. In a case where there is a miss for the memory access request in the cache TAG 46-1 under its subject node as a result of the search, for example, the local snoop control unit 42-1 cancels the local snoop control operation, and transmits a global snoop request to the request transmission/reception unit 41-1.
The broadcast control unit 43-1 transmits and receives global snoop requests to and from the request transmission/reception unit 41-1 of its subject node, and also transmits and receives global snoop requests to and from the system controller 40-2 of the other node (the system board B in this example; the same applies hereinafter).
More specifically, when receiving a global snoop request from the request transmission/reception unit 41-1, the broadcast control unit 43-1 transmits the global snoop request to the global snoop control unit 44-1 described later, and outputs (broadcasts) the global snoop request to the system controller 40-2 of the other node. When receiving a global snoop request from the system controller 40-2 of the other node, the broadcast control unit 43-1 transmits the global snoop request to the global snoop control unit 44-1.
The global snoop control unit 44-1 searches a cache memory under its subject node for the target data, and exchanges search results with the system controller 40-2 under the other node. Based on the search result in the system controller 40-2 under the other node and the search result of its own, the global snoop control unit 44-1 determines an operation to be performed in response to the memory access request.
More specifically, when receiving a global snoop request from the broadcast control unit 43-1, the global snoop control unit 44-1 searches the cache TAG 46-1 under its subject node for the target address information corresponding to the target data of the global snoop request, as an operation in response to the CPU that has issued the memory access request.
Meanwhile, when the global snoop control unit 44-2 of the other node receives a global snoop request from the broadcast control unit 43-1 of its subject node via the broadcast control unit 43-2 of the other node, the global snoop control unit 44-2 searches the cache TAG 46-2 under the other node for the target address information corresponding to the target data of the global snoop request. After that, the global snoop control units 44-1 and 44-2 exchange and combine the cache TAG search results (the result of the search on the cache TAG 46-1 conducted by the global snoop control unit 44-1, and the result of the search on the cache TAG 46-2 conducted by the global snoop control unit 44-2), so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 44-1 of its subject node determines an operation to be performed in response to the memory access request.
For example, in a case where it becomes clear as a result of the merging of the cache statuses that the target data of the memory access request issued from the CPU 10a is present in the main memory 30 under its subject node, the global snoop control unit 44-1 issues a memory access request to the memory access issuing unit 45-1 under its subject node. In a case where it becomes clear as a result of the merging of the cache statuses that the target data of the memory access request issued from the CPU 10a is present in the cache memory 12a in the CPU 12 under the other node, the global snoop control unit 44-1 issues a memory access request to the CPU 12a under the other node.
The memory access issuing unit 45-1 executes a memory access request, based on an operation in response to a memory access request determined by the local snoop control unit 42-1 or the global snoop control unit 44-1.
The cache TAG 46-2, the request transmission/reception unit 41-2, the local snoop control unit 42-2, the broadcast control unit 43-2, the global snoop control unit 44-2, and the memory access issuing unit 45-2 provided in the system controller 40-2 are the same as the cache TAG 46-1, the request transmission/reception unit 41-1, the local snoop control unit 42-1, the broadcast control unit 43-1, the global snoop control unit 44-1, and the memory access issuing unit 45-1 of the system controller 40-1, respectively, except that the communication control operations are to be performed with respect to the first-level cache memories 12a and 13a, the second-level cache memories 12b and 13b, and the main memories 32 and 33.
The following is a description of an operation flow to be performed to access data that is present only in a local memory and is not present in any of the cache memories provided in the conventional large-scale information processing device 100.
As illustrated in
If the result of the search conducted in response to the memory access request illustrates a miss in the cache TAG 46-1 under its subject node (indicated as “result=MISS” in
When receiving the global snoop request from the broadcast control unit 43-1, the global snoop control unit 44-1 of its subject node searches the cache TAG 46-1 under its subject node for the target address information corresponding to the target data of the global snoop request (see t6). Meanwhile, when the global snoop control unit 44-2 of the other node receives the global snoop request from the broadcast control unit 43-1, the global snoop control unit 44-2 searches the cache TAG 46-2 under the other node for the target address information corresponding to the target data of the global snoop request (see t7). The global snoop control units 44-1 and 44-2 of the respective nodes exchange the results of the searches on the cache TAGs 46-1 and 46-2 with each other, and combines the results so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 44-1 determines the final operation in response to the fetch request (see t8).
If the target data of the fetch request is not detected from any of the cache memories, and the global snoop control unit 44-1 determines that the primary data corresponding to the target data of the fetch request is to be read from the main memory 30 under its subject node, the memory access issuing unit 45-1 issues a read request (indicated as “MS-RD-REQ” in
Next, an operation flow to be performed to access cache data present in a cache memory of its subject node in the conventional large-scale information processing device 100 is described.
As illustrated in
It the result of the search conducted in response to the memory access request illustrates a hit in the cache TAG 46-1 under its subject node (indicated as “result=HIT” in
If it becomes clear that the target data of the fetch request is present in the first-level cache memory 11a in the CPU 11 under its subject node, and the local snoop control unit 42-1 determines that the cache data corresponding to the target data of the fetch request is to be read from the first-level cache memory 11a, the local snoop control unit 42-1 issues a read request (indicated as “CPBK-REQ” in
As described above, in the conventional large-scale information processing device 100, the global snoop control operation is omitted, and an access is made to a main memory under its subject node only in the following cases (1) to (6).
(1) Where the issued memory access request is a command fetch request, and the target data of the command fetch request is found as a shared type (a shared fetch request to simply fetch the target data from one of the cache memories provided in the information processing device 100) in the cache TAG 46-1 under its subject node.
(2) Where the issued memory access request is a command fetch request, and the target data of the command fetch request is found as an exclusive type (an exclusive-type fetch command to cause only one cache memory to store the target data among all the cache memories provided in the information processing device 100) in the cache TAG 46-1 under its subject node.
(3) Where the issued memory access request is a shared-type (load) fetch request, and the target data of the shared-type fetch request is found as a shared type in the cache TAG 46-1 under its subject node.
(4) Where the issued memory access request is a shared-type fetch request, and the target data of the shared-type fetch request is found as an exclusive type in the cache TAG 46-1 under its subject node.
(5) Where the issued memory access request is an exclusive-type (store) fetch request, and the target data of the exclusive-type fetch request is found as an exclusive type in the cache TAG 46-1 under its subject node.
(6) Where the issued memory access request is a block store request, and the target data of the block store request is found as an exclusive type in the cache TAG 46-1 under its subject node.
As described above, by the conventional technique, only when the target data of a memory access request is found in a local cache memory, the global snoop control operation over the system boards in the information processing device 100 can be skipped, and a data transfer between the CPUs under its subject node can be activated.
However, the above conventional technique can only cover the total capacity of the cache memories mounted in its subject node.
Also, in a case where there is a miss in all the cache memories under its subject node, the location of the latest data corresponding to the target data cannot be detected. Therefore, in such a case, it is necessary to perform the global snoop control operation over the system boards.
As a result, the access start success rate is not sufficiently high when the global snoop operation is skipped, and the performance of the device might not be improved as desired.
The disclosed information processing device includes two or more nodes each having main memories, processors including cache memories, and a system controller that performs a control operation to guarantee cache coherency among the nodes. The system controller of at least one of the nodes includes a holding unit that holds the specific information about primary data that is present in the main memories under its subject node, with the cache data corresponding to the primary data not present in the cache memories of the nodes other than its subject node.
The disclosed memory control method for an information processing device that includes two or more nodes each having main memories, processors including cache memories, and a system controller that performs a control operation to guarantee cache coherency among the nodes includes: a memory access request receiving step of receiving a memory access request generated at its subject node that is one of the nodes; an access step of accessing a holding unit when the memory access request is received in the memory access request receiving step, the holding unit holding the specific information about primary data present in the main memories of its subject node, with the cache data corresponding to the primary data not present in the cache memories of the other nodes; and a local snoop control step of performing a local snoop control operation to guarantee cache coherency at its subject node, when the target data of the memory access request received in the receiving step corresponds to the specific information held by the holding unit.
The disclosed memory control device includes: main memories, processors having cache memories, and a system controller that performs a control operation to guarantee cache coherency between the memory control device and other memory control devices. The system controller includes a holding unit that holds the specific information about primary data that is present in the main memories of the memory control devicer with the cache data corresponding to the primary data not present in the cache memories of the other memory control devices.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
The following is a description of embodiments of the present invention, with reference to the accompanying drawings.
The information processing device 1 as an embodiment of the present invention is a large-scale SMP information processing device. As illustrated in
The system boards SB0 to SB15 are separated from one another by partitions (nodes) formed with physical boundaries (physical spaces) such as housings, boards, and chips. Accordingly, each one system board serves as one node that is an expansion unit of processing capacity in the information processing device 1.
The information processing device as an embodiment of the present invention and a local snoop control method are now described in detail, by way of examples of the system boards SB0 and SB1.
As illustrated in
Each of the CPUs 10 to 17 includes multilevel cache memories (two levels in this example). More specifically, the CPU 10 includes a first-level cache memory (a cache memory) 10a and a second-level cache memory (a cache memory) 10b, the CPU 11 includes a first-level cache memory 11a and a second-level cache memory 11b, the CPU 12 includes a first-level cache memory 12a and a second-level cache memory 12b, and the CPU 13 includes a first-level cache memory 13a and a second-level cache memory 13b. Likewise, the CPU 14 includes a first-level cache memory 14a and a second-level cache memory 14b, the CPU 15 includes a first-level cache memory 15a and a second-level cache memory 15b, the CPU 16 includes a first-level cache memory 16a and a second-level cache memory 16b, and the CPU 17 includes a first-level cache memory 17a and a second-level cache memory 17b.
In the following description, the data stored (or present) in the cache memories will be referred to as cache data. The cache data is a duplicate of primary data stored (or present) in the main memories (a memory duplicate)
The system controllers 50-1 and 50-2 perform control operations to guarantee cache coherency among all the system boards SB0 to SB15 provided in the information processing device 1.
Although the system boards provided in the information processing device 1 include the system boards SB0 to SB15 (see
The system controllers 50-1 and 50-2 are connected to each other via a bus, so that the system controllers 50-1 and 50-2 can communicate with each other. The system controllers 50-1 and 50-2 share communication control operations for the memories provided in the information processing device 1 (in this example, the first-level cache memories 10a to 17a, the second-level cache memories 10b to 17b, and the main memories 30 to 33). The system controller 50-1 and the system controller 50-2 have the same structures, except that the system controllers 50-1 and 50-2 perform communication control operations on different memories from each other. More specifically, the system controller 50-1 performs communication control operations on the CPUs 10 to 13, the I/O devices 20 and 21, and the main memories 30 and 31. The system controller 50-2 performs communication control operations on the CPUs 14 to 17, the I/O devices 22 and 23, and the main memories 32 and 33.
The system controllers 50-1 and 50-2 each have a mechanism to perform a control operation to guarantee cache coherency beyond the boundaries between the system boards SB0 to SB15 provided in the information processing device 1 (a coherence control operation). The cache coherency is the consistency of data to be maintained, so that the latest correct cache data can be accessed even when a data update is performed for each set of cache data present in cache memories corresponding to the same primary data.
The system controller 50-1 includes a cache TAG (the first holding unit) 52-1, a virtual TAG expansion (VTAGX; the holding unit or the second holding unit) 57-1, a request transmission/reception unit 51-1, a broadcast control unit 53-1, a global snoop control unit 54-1, a local snoop control unit 55-1, and a memory access issuing unit 56-1.
The cache TAG 52-1 registers and holds (stores) the address information (the specific information) for identifying cache data present in the cache memories (the first-level cache memories 10a to 13a and the second-level cache memories 10b to 13b in this example; the same applies hereinafter) under its subject node (the local system board, or the system board SB0 in this example; the same applies hereinafter). The cache TAG 52-1 is formed by a known art, and therefore, explanation of it is omitted here.
The VTAGX 57-1 registers and holds the address information (the specific information) for identifying primary data that is present in the main memories (the local memories) 30 and 31 under its subject node, with the cache data corresponding to the primary data not present in the cache memories (not illustrated in
The VTAGX 57-1 registers the address information of such data size that can be read by a CPU in one operation, and is managed for each line size of the cache memories. The VTAGX 57-1 also stores a valid bit (the state information) indicating whether the address information is in a valid state or in an invalid state in association with the address information. Where the valid bit indicates a valid state, the address information registered in the VTAGX 57-1 can be detected through the snooping by the local snoop control unit 55-1 and the global snoop control unit 54-1 described later. Where the valid bit indicates an invalid state, the address information registered in the VTAGX 57-1 cannot be detected through the snooping by the local snoop control unit 55-1 and the global snoop control unit 54-1 described later.
The VTAGX 57-1 may also store a series of address information forming successive sets of address information on the address boundaries. By simultaneously managing the addresses of successive lines at the VTAGX 57-1, the space efficiency of the random access memory (RAM) can be increased. In this case, valid bits associated with the sets of address information in the address information series are registered independently of one another in the VTAGX 57-1.
The request transmission/reception unit 51-1 receives a memory access request issued from the CPUs 10 to 17 or the I/O devices 20 to 23, and a global snoop request to perform a global snoop control operation. Through the global snoop control operation, an operation to be performed in response to a memory access request is determined by performing a control operation over the nodes to guarantee cache coherency between all the nodes (the system boards SB0 to SB15 in
When receiving a memory access request, the request transmission/reception unit 51-1 transmits the memory access request to the local snoop control unit 55-1 described later. When receiving a global snoop request for the global snoop control from the local snoop control unit 55-1, the request transmission/reception unit 51-1 transmits the global snoop request to the global snoop control unit 54-1 described later.
More specifically, in a case where a memory access request is issued from the CPU 10, and a cache miss with respect to the data to be accessed in response to the memory access request occurs in the first-level cache memory 10a and the second-level cache memory 10b, the request transmission/reception unit 51-1 receives the memory access request (the read request) issued from the CPU 10. The request transmission/reception unit 51-1 then transmits the memory access request to the local snoop control unit 55-1 described later. When receiving a global snoop request from the local snoop control unit 55-1 after that, the request transmission/reception unit 51-1 transmits the global snoop request to the broadcast control unit 53-1 described later.
The local snoop control unit 55-1 performs the local snoop control operation (the local snooping) in a case where a memory access request is issued under its subject node, and the target data of the memory access request corresponds to the address information stored in the cache TAG 52-1 or the VTAGX 57-1 under its subject node. Through the local snoop control operation, an operation to be performed in response to the memory access request is determined by performing a control operation to guarantee cache coherency under its subject node. Accordingly, in the above case, the local snoop control unit 55-1 performs a control operation to guarantee cache coherency only within a closed range in the local system board as its subject node. In this manner, the local snoop control unit 55-1 guarantees cache coherency between all the nodes provided in the information processing device 1.
If the target data of the memory access request does not correspond to any of the address information stored in the cache TAG 52-1 or the VTAGX 57-1 under its subject node, the local snoop control unit 55-1 transmits a global snoop request to the request transmission/reception unit 51-1.
As illustrated in
The request port unit 66-1 sequentially stores (holds) requests received from the request transmission/reception unit 51-1.
The request selecting unit 67-1 selects a request from the requests stored in the request port unit 66-1.
The pipeline unit (a local snoop control unit) 68-1 performs a local snoop control operation on the target data of the request selected by the request selecting unit 67-1.
More specifically, when receiving a memory access request from the request transmission/reception unit 51-1, the pipeline unit 68-1 searches (snoops) the cache TAG 52-1 and the VTAGX 57-1 under its subject node for the address information to be accessed (hereinafter referred to simply as the target address information) for identifying the target data of the memory access request as an operation in response to the CPU that has issued the memory access request.
If there is a hit in the cache TAG 52-1 or the VTAGX 57-1 under its subject node as a result of the search in response to the memory access request, the pipeline unit 68-1 determines an operation to be performed in response to the memory access request, based on the search result. In this case, the pipeline unit 68-1 notifies that the latest cache data corresponding to the target data is present in the cache memory or the main memory under its subject node. By doing so, the pipeline unit 68-1 guarantees that the cache data in the cache memory under its subject node has not been updated. The operation to be performed in response to the memory access request is to issue a request to read the data in the main memory or a request for the CPU to purge the data in the cache memory.
If there is a miss in both the cache TAG 52-1 and the VTAGX 57-1 under its subject node as a result of the search, the pipeline unit 68-1 cancels the local snoop control operation, and transmits a global snoop request to the request transmission/reception unit 51-1.
The broadcast control unit 53-1 transmits and receives global snoop requests to and from the request transmission/reception unit 51-1 of its subject node, and also transmits and receives global snoop requests to and from the system controllers of all the nodes (the system controller 50-2 as the other node in this example; the same applies hereinafter) other than its subject node in the information processing unit 1.
More specifically, when receiving a global snoop request from the request transmission/reception unit 51-1, the broadcast control unit 53-1 transmits the global snoop request to the global snoop control unit 54-1 described later, and broadcasts the global snoop request to the system controller 50-2 as the other node. In this manner, the broadcast control unit 53-1 broadcasts only the memory access requests that are determined to be impossible for the local snoop control unit 55-1 to handle.
When receiving a global snoop request from the system controller 50-2 as the other node, the broadcast control unit 53-1 transmits the global snoop request to the global snoop control unit 54-1.
The global snoop control unit 54-1 performs a global snoop control operation. As illustrated in
The request port unit 61-1 sequentially stores (holds) global snoop requests received from the broadcast control unit 53-1.
The request selecting unit 62-1 selects a global snoop request from the global snoop requests stored in the request port unit 61-1.
The pipeline unit (a global snoop control unit) 63-1 performs a global snoop control operation on the target data of the global snoop request selected by the request selecting unit 62-1.
The pipeline unit 63-1 detects the target data of the global snoop request from the cache memory under its subject node, and exchanges the search results with the system controller 50-2 as the other node. In this manner, the pipeline unit 63-1 determines an operation to be performed in response to the memory access request, based on the combined result of the search result of the system controller 50-2 as the other node and the search result of its own.
More specifically, in response to the CPU that has issued the memory access request, the pipeline unit 63-1 searches the cache TAG 52-1 under its subject node for the target address information corresponding to the target data of the global snoop request selected by the request selecting unit 62-1.
When the global snoop control unit 54-2 (or the pipeline unit: not illustrated) of the other node receives a global snoop request from the broadcast control unit 53-1 of the subject node via the broadcast control unit 53-2 of the other node, the global snoop control unit 54-2 searches the cache TAG 52-2 under the other node for the target address information corresponding to the target data of the global snoop request. After that, the pipeline unit 63-1 and the global snoop control unit 54-2 of the respective nodes transmit, receive, via the communication unit 64-1 which is to be described later, and combine the cache TAG search results (the result of the search of the cache TAG 52-1 from the pipeline unit 63-1, and the result of the search of the cache TAG 52-2 from the global snoop control unit 54-2 in this example), so as to merge the cache statuses. Based on the result of the merging of the cache statuses, the pipeline unit 63-1 determines an operation to be performed in response to the memory access request.
In a case where it becomes clear as a result of the merging of the cache statuses that the target data of the memory access request issued from the CPU 10a is not present in any of the cache memories provided in the information processing device 1, and the main memory 30 under its subject node is to be accessed, the pipeline unit 63-1 issues a memory access request to the memory access issuing unit 56-1 under its subject node. In a case where it becomes clear as a result of the merging of the cache statuses that the target data of the memory access request issued from the CPU 10a is present in the cache memory 14a in the CPU 14 under the other node, the pipeline unit 63-1 issues a memory access request to the CPU 14a under the other node.
Accordingly, when receiving a global snoop request from the broadcast control unit 53-1, the pipeline unit 63-1 searches the cache TAG 52-1 under its subject node and the cache TAG 52-2 under the other node for the target address information corresponding to the target data of the global snoop request. The pipeline unit 63-1 then notifies the entire information processing device 1 that the CPU is to access the main memories 30 and 31 under its subject node, and receives a response. In this manner, the pipeline unit 63-1 performs the control operation to guarantee cache coherency in the entire information processing device 1.
The pipeline unit 63-1 also functions as a registration unit and an invalidation unit, as well as the above described unit.
In a case where it becomes clear as a result of the global snoop control operation that the cache data corresponding to the primary data present in the main memories 30 and 31 of its subject node is not present in any cache memory of any node in the information processing device 1, the pipeline unit (hereinafter referred to as the registration unit) 63-1 registers the address information identifying the primary data in the VTAGX 57-1, and causes the VTAGX 57-1 to hold the address information. The registration unit 63-1 notifies the entire information processing device 1 that the CPU is to access a memory, and receives a response. If it becomes clear from the response that the cache data corresponding to the target data is not registered in any of the cache memories provided in the information processing device 1, the registration unit 63-1 registers the address information identifying the primary data corresponding to the target data in the VTAGX 57-1 under its subject node. Accordingly, when a memory access request issued from the CPUs (local CPUs) 10 to 13 under its subject node is directed to the main memories 30 and 31 under its subject node, and an access to the main memories 30 and 31 under its subject node is to be made as a result of a global snoop operation (a miss), the registration unit 63-1 registers the corresponding address information in the VTAGX 57-1 under its subject node.
In this embodiment, using the set associative method, the registration unit 63-1 selects a new registration entry and registers address information in accordance with the Least Recently Used (LRU) policy between WAYs if there is not an entry space at the time of new entry registration. Even if there is the same valid entry as the address information to be registered, the registration unit 63-1 does not perform any processing for the address information present in the valid entry, and replaces the valid entry with the address information to be registered.
When functioning as an invalidation unit, the pipeline unit (hereinafter referred to as the invalidation unit) 63-1 invalidates the address information that corresponds to primary data present in the main memories 30 and 31 of its subject node and is stored in the VTAGX 57-1 under its subject node, when an operation as a response to a memory access request is determined in a case where the memory access request is issued at one of the nodes (the system board SB1 in this example) other than its subject node in the information processing device 1, and the target data of the memory access request is the primary data that is present in the main memories 30 and 31 of its subject node and corresponds to the address information stored in the VTAGX 57-1 under its subject node.
More specifically, when the entire information processing device 1 is notified of an access request for the CPUs 14 to 17 under the other node to access the main memories 30 and 31 under its subject node, the invalidation unit 63-1 checks the target address information of this access request against the address information stored in the VTAGX 57-1 under its subject node. If the address information matching the target address information is stored in the VTAGX 57-1 under its subject node, the invalidation unit 63-1 changes the state of the valid bit corresponding to the address information from a valid state into an invalid state, so as to guarantee cache coherency in the entire information processing device 1. Accordingly, when an access is determined to be made in a case where the access destination of a memory access request from the CPUs 14 to 17 (the remote CPUs) under the other node is the data having its address registered in the VTAGX 57-1 under its subject node, the invalidation unit 63-1 invalidates the corresponding entry in the VTAGX 57-1 under its subject node.
The communication unit 64-1 performs communications with the pipeline unit 63-1 and the global snoop control units (the global snoop control unit 54-2 of the other node in this example) of all the other nodes than its subject node in the information processing device 1.
The advance registration requesting unit (the extended specific information generating unit) 65-1 generates advance address information (the extended specific information) that is address information different from the address information stored in the VTAGX 57-1.
The advance registration requesting unit 65-1 is provided for the following reasons.
In a case where the registration unit 63-1 registers the corresponding address information in the VTAGX 57-1 only when a fetch request from the CPU (the local CPU) under its subject node is executed, the target data corresponding to the address information is purged from the cache memory after the registration. When the CPU under its subject node again accesses the target data corresponding to the address information, there is a hit in the VTAGX 57-1 for the first time. Accordingly, only when an access is made for the second time or after the second time, the global snoop operation can be skipped if there is a hit in the VTAGX 57-1, and the latency can be shortened. Thus, the advance registration requesting unit 65-1 is effective when an access is made to reusable data for the second time or after the second time, but is not effective when an access is made for the first time.
In view of this, the information processing device 1 of this embodiment includes the advance registration requesting unit 65-1, and, when the address information to be registered is registered in the VTAGX 57-1, the advance registration requesting unit 65-1 utilizes the continuity of memory accesses in accordance with the program, and registers in advance the address information present several Kbytes ahead of the registered address information into the VTAGX 57-1.
Next, an advance registration operation to be performed with the use of the advance registration requesting unit 65-1 is described in detail.
In a case where there is not a hit in any of the cache memories provided in the information processing device 1 in response to an access request from a CPU as a result of a global snoop control operation, and an access is determined to be made to the local memory of its subject node as an operation in response to the CPU, the advance registration requesting unit 65-1 first generates advance address information by adding a predetermined amount (several Kbytes, for example) to the target address information corresponding to the access request at all the system controllers provided in the information processing device 1. Here, all the system controllers are the system controllers (not illustrated) provided for the respective system boards SB0 to SF15. For convenience, the system controller 50-1 provided for the system board SB0 is described here as a representative one of all the system controllers provided in the information processing device 1. In a case where the target address information corresponding to the access request is successfully registered in the VTAGX 57-1, the advance registration requesting unit 65-1 generates the advance address information that is the address information predicted to be accessed after the target address information. The advance registration requesting unit 65-1 includes a request port (the advance registration requesting port; not illustrated) that sequentially accumulates (stores) the generated advance address information.
When the request port is available, the advance registration requesting unit 65-1 sets the generated advance address information into the request port. If effective advance address information (an advance registration request) is present in the advance address information accumulated in the request port, the advance registration requesting unit 65-1 sequentially inputs the effective advance address information to the request port unit 61-1. Here, the effective advance address information indicates that the primary data corresponding to the advance address information is present in the main memories 30 and 31 of its subject node. Accordingly, if the primary data corresponding to the advance address information exits in the main memories 30 and 31 of its subject node, the advance registration requesting unit 65-1 causes the advance address information to take part in the request selecting operation for the global snoop control, together with a memory access request broadcast from the other node.
After that, the pipeline unit 63-1 performs the global snoop control operation for the advance address information selected by the request selecting unit 62-1. More specifically, the pipeline unit 63-1 determines whether the cache data corresponding to the advance address information is present in one of the cache memories of the nodes other than its subject node in the information processing device 1. If the result of the global snoop control performed for the advance address information illustrates that there is not a hit (there is a cache miss) at the cache TAGs of all the nodes provided in the information processing device 1, the pipeline unit 63-1 registers the advance address information in the VTAGX 57-1 under its subject node. If the result of the global snoop control performed for the advance address information illustrates that there is a hit at one of the cache TAGs of the nodes provided in the information processing device 1, the pipeline unit 63-1 does not register the advance address information in the VTAGX 57-1, and ends the operation.
Accordingly, in a case where the result of the global snoop control performed for the advance address information generated by the advance registration requesting unit 65-1 illustrates that the primary data corresponding to the advance address information is present in the main memories 30 and 31 of its subject node, and the cache data corresponding to the advance address information is not present in any of the cache memories of the nodes other than its subject node in the information processing device 1, the pipeline unit 63-1 registers the advance address information in the VTAGX 57-1 under its subject node, and causes the VTAGX 57-1 to hold the advance address information.
The memory access issuing unit 56-1 executes a memory access request for the main memories 30 and 31 under its subject node, based on an operation in response to a memory access request determined by the local snoop control unit 55-1 or the global snoop control unit 54-1.
The cache TAG 52-2, the VTAGX 57-2, the request transmission/reception unit 51-2, the local snoop control unit 55-2, the broadcast control unit 53-2, the global snoop control unit 54-2, and the memory access issuing unit 56-2 provided in the system controller 50-2 are the same as the cache TAG 52-1, the VTAGX 57-1, the request transmission/reception unit 51-1, the local snoop control unit 55-1, the broadcast control unit 53-1, the global snoop control unit 54-1, and the memory access issuing unit 56-1 of the system controller 50-1, respectively, except that the communication control operations are to be performed with respect to the first-level cache memories 14a to 17a, the second-level cache memories 14b to 17b, and the main memories 32 and 33.
The following is a description of an operation flow (the first example operation) to be performed in the information processing device 1 as an embodiment of the present invention, in a case where primary data present in a main memory under its subject node is to be accessed, the cache data corresponding to the primary data is not present in any of the cache memories provided in the information processing device 1, and the address information corresponding to the primary data is not registered in the VTAGX 57-1.
As illustrated in
If the result of the search conducted in response to the memory access request illustrates a miss in both the cache TAG 52-1 and the VTAGX 57-1 under its subject node (indicated as “result=MISS” in
When receiving the global snoop request from the broadcast control unit 53-1, the global snoop control unit 54-1 of its subject node searches the cache TAG 52-1 under its subject node for the target address information corresponding to the target data of the global snoop request (see t6). Meanwhile, when the global snoop control unit 54-2 of the other node receives the global snoop request from the broadcast control unit 53-1, the global snoop control unit 54-2 searches the cache TAG 52-2 under the other node for the target address information corresponding to the target data of the global snoop request (see t7). The global snoop control units 54-1 and 54-2 of the respective nodes exchange the results of the cache TAG searches with each other, and combines the results so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 54-1 determines the final operation in response to the fetch request (see t8; the global snoop control step).
If the target data of the fetch request is not detected from any of the cache memories, and the global snoop control unit 54-1 determines that the primary data corresponding to the target data of the fetch request is to be read from the main memory 30, the global snoop control unit 54-1 also determines the registration in the VTAGX 57-1, and registers the address information corresponding to the primary data in the VTAGX 57-1 (see t9; the specific information registration step). Also, the memory access issuing unit 56-1 issues a read request (indicated as “MS-RD-REQ” in
The following is a description of an operation flow (the second example operation) to be performed in the information processing device 1 as an embodiment of the present invention, in a case where primary data present in a main memory under its subject node is to be accessed, and the address information corresponding to the primary data is registered in the VTAGX 57-1.
As illustrated in
If the result of the search conducted in response to the memory access request illustrates a miss in the cache TAG 52-1 under its subject node but a hit in the VTAGX 57-1 under its subject node (indicated as “VTAGX=HIT” and “result HIT” in
If it becomes clear that the target data of the fetch request is not present in any of the cache memories under its subject node, and the local snoop control unit 55-1 determines that the primary data corresponding to the target data of the fetch request is to be read from the main memory 30, the memory access issuing unit 56-1 issues a read request with respect to the fetch request, to the main memory 30 (see t5). The primary data corresponding to the fetch request is then read from the main memory 30 into the system controller 50-1 (indicated as “RD→DATA”; see t6 in
The following is a description of an operation flow (the third example operation) to be performed in the information processing device 1 as an embodiment of the present invention, in a case where primary data present in a main memory under its subject node is to be accessed, and the cache data corresponding to the primary data is present in a cache memory of one of the nodes other than its subject node in the information processing device 1, though the address information corresponding to the primary data is not registered in the VTAGX 57-1.
As illustrated in
If the result of the search conducted in response to the memory access request illustrates a miss in both the cache TAG 52-1 and the VTAGX 57-1 under its subject node (indicated as “VTAGX=MISS” and “result=MISS” in
When receiving the global snoop request from the broadcast control unit 53-1, the global snoop control unit 54-1 of its subject node searches the cache TAG 52-1 under its subject node for the target address information corresponding to the target data of the global snoop request (see t6). Meanwhile, when the global snoop control unit 54-2 of the other node receives the global snoop request from the broadcast control unit 53-1, the global snoop control unit 54-2 searches the cache TAG 52-2 under the other node for the target address information corresponding to the target data of the global snoop request (see t7). The global snoop control units 54-1 and 54-2 of the respective nodes exchange the results of the cache TAG searches with each other, and combines the results so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 54-1 determines the final operation in response to the fetch request (see t8; the global snoop control step).
If it becomes clear that the target data of the fetch request is present in the first-level cache memory 14a in the CPU 14 under the other node (the system board SB1 in this example), and the global snoop control unit 54-1 determines that the cache data corresponding to the target data of the fetch request is to be read from the first-level cache memory 14a, the global snoop control unit 54-2 issues a read request with respect to the fetch request, to the CPU 14 including the first-level cache memory 14a (see t9). The cache data corresponding to the fetch request is then read from the first-level cache memory 11a (the CPU 14) into the system controller 50-2 (indicated as “MODQ”; see t10 in
The following is a description of an operation flow (the fourth example operation) to be performed in the information processing device 1 as an embodiment of the present invention, in a case where an advance registration is successfully made in the VTAGX 57-1.
The procedures at t1 to t13 are the same as those in the first example operation described with reference to
As illustrated in
Each of the global snoop control units then performs a global snoop control operation on the advance address information generated by the advance registration requesting unit 65-1 (see t16 and t17). More specifically, the global snoop control unit 54-1 of its subject node searches the cache TAG 52-1 under its subject node for the advance address information generated by the advance registration requesting unit 65-1 (see t16). Meanwhile, the global snoop control unit 54-2 of the other node searches the cache TAG 52-2 under the other node for the same advance address information as the advance address information searched under the subject node (see t17). The global snoop control units 54-1 and 54-2 of the respective nodes exchange the results of the cache TAG searches with each other, and combines the results so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 54-1 determines whether to register the advance address information in the VTAGX 57-1 under its subject node (see t18; the extended specific information registration step).
If the result of the global snoop control performed for the advance address information illustrates a miss in all the cache TAGs, the global snoop control unit 54-1 registers the advance address information in the VTAGX 57-1 under its subject node, and the execution of the advance registration request is completed.
The following is a description of an operation flow (the fifth example operation) to be performed in the information processing device 1 as an embodiment of the present invention, in a case where an advance registration in the VTAGX 57-1 is failed.
The procedures at t1 to t17 are the same as those in the first example operation described with reference to
As illustrated in
If the result of the global snoop control performed for the advance address information illustrates a hit in one of the cache TAGs, the global snoop control unit 54-1 does not register the advance address information in the VTAGX 57-1 under its subject node, and the execution of the advance registration request is completed.
The following is a description of an operation flow (the sixth example operation) to be performed in the information processing device 1 as an embodiment of the present invention, in a case where address information registered in the VTAGX 57-1 is to be put into an invalid state.
As illustrated in
If the result of the search conducted in response to the memory access request illustrates a miss in both the cache TAG 52-2 and the VTAGX 57-2 under the other node (indicated as “result=MISS” in
When receiving the global snoop request from the broadcast control unit 53-2, the global snoop control unit 54-2 under the other node searches the cache TAG 52-2 under the other node for the target address information corresponding to the target data of the global snoop request (see t6). Meanwhile, when the global snoop control unit 54-1 under its subject node receives the global snoop request from the broadcast control unit 53-2 under the other node, the global snoop control unit 54-1 searches the cache TAG 52-1 under its subject node for the target address information corresponding to the target data of the global snoop request (see t7). The global snoop control units 54-1 and 54-2 of the respective nodes exchange the results of the cache TAG searches with each other, and combines the results so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 54-1 determines the final operation in response to the fetch request (see t8).
If the target data of the fetch request is not detected from any of the cache memories, and the global snoop control unit 54-2 determines that the primary data corresponding to the target data of the fetch request is to be read from the main memory 30 under its subject node, the global snoop control unit 54-1 under its subject node changes the valid bit corresponding to the address information stored in the VTAGX 57-1 from a valid state to an invalid state. In this manner, the global snoop control unit 54-1 under its subject node invalidates the address information in the VTAGX 57-1 (see t9; the invalidation step). Also, the memory access issuing unit 56-1 of its subject node issues a read request (indicated as “MS-RD-REQ” in
As described above, in the information processing device 1 as an embodiment of the present invention, the local snoop control unit 55-1 skips the global snoop control operation, and activates an access to a main memory under its subject node in the following cases (1) to (6) (see the remarks denoted by “circles” in
(1) Where the issued memory access request is a command fetch request, and the target data of the command fetch request is found as a shared type (a shared fetch request to simply fetch the target data from one of the cache memories provided in the information processing device 1) in the cache TAG 52-1 under its subject node (see section “1.3” in
(2) Where the issued memory access request is a command fetch request, and the target data of the command fetch request is found as an exclusive type (an exclusive-type fetch command to cause only one cache memory to store the target data among all the cache memories provided in the information processing device 1) in the cache TAG 52-1 under its subject node (see section “1.4” in
(3) Where the issued memory access request is a shared-type (load) fetch request, and the target data of the shared-type fetch request is found as a shared type in the cache TAG 52-1 under its subject node (see section “2.3” in
(4) Where the issued memory access request is a shared-type fetch request, and the target data of the shared-type fetch request is found as an exclusive type in the cache TAG 52-1 under its subject node (see section “2.4” in
(5) Where the issued memory access request is an exclusive-type (store) fetch request, and the target data of the exclusive-type fetch request is found as an exclusive type in the cache TAG 52-1 under its subject node (see section “3.5” in
(6) Where the issued memory access request is a block store request, and the target data of the block store request is found as an exclusive type in the cache TAG 52-1 under its subject node (see section “4.5” in
In the information processing device 1 as an embodiment of the present invention, the local snoop control unit 55-1 can skip the global snoop control operation, and activate an access to a main memory under its subject node in the following cases (7) to (12) (see the remarks denoted by “double circles” in
(7) Where the issued memory access request is a command fetch request, and the target data of the command fetch request is not found in the cache TAG 52-1 under its subject node, but is found in the VTAGX 57-1 under its subject node (see section “1.2” in
(8) Where the issued memory access request is a shared-type (load) fetch request, and the target data of the shared-type fetch request is not found in the cache TAG 52-1 under its subject node, but is found in the VTAGX 57-1 under its subject node (see section “2.2” in
(9) Where the issued memory access request is an exclusive-type (store) fetch request, and the target data of the exclusive-type fetch request is not found in the cache TAG 52-1 under its subject node, but is found in the VTAGX 57-1 under its subject node (see section “3.2” in
(10) Where the issued memory access request is an exclusive-type fetch request, and the target data of the exclusive-type fetch request is found as a shared type in the cache TAG 52-1 under its subject node, and is also found in the VTAGX 57-1 under its subject node (see section “3.4” in
(11) Where the issued memory access request is a block store request, and the target data of the block store request is not found in the cache TAG 52-1 under its subject node, but is found in the VTAGX 57-1 under its subject node (see section “4.2” in
(12) Where the issued memory access request is a block store request, and the target data of the block store request is found as a shared type in the cache TAG 52-1 under its subject node, and is also found in the VTAGX 57-1 under its subject node (see section “4.4” in
In the cases other than the cases (1) to (12), the global snoop control unit 54-1 performs the global snoop control operation (see
Although the operations to be performed by the system controller 50-1 according to the local snoop control method (the operation to be performed in the information processing device 1) as an embodiment of the present invention have been described so far, the operations to be performed by the system controllers under the respective system boards SB1 to SB15 in a case where a memory access request is issued from a CPU or an I/O device under the system boards SB1 to SB15 other than the system board SB0 are the same as the above operations to be performed by the system controller 50-1.
As described above, according to the local snoop control method as an embodiment of the present invention in the information processing device 1 as an embodiment of the present invention, the VTAGX 57-1 having larger capacity than the capacity of the cache memory under its subject node is added to each of the nodes provided in the information processing device 1, and the search of the VTAGX 57-1 is added to the operation of the local snoop control unit 55-1, so as to virtually expand the cache memory space under its subject node. Accordingly, even if there is a miss in the cache TAG 52-1 under its subject node, an access to the target data in the main memories 30 and 31 under its subject node can be activated by performing a local snoop control operation as a low-latency data communication, as long as there is a hit in the VTAGX 57-1. In this manner, the global snoop control over the nodes in the information processing device 1 can be skipped, and still, cache coherency among all the nodes in the information processing device 1 can be guaranteed. Thus, the conditions under which the global snoop control operation is skipped can be made wider. Accordingly, the latency of each memory access in the large-scale information processing device 1 can be shortened, and the throughput in each snoop operation can be improved. Further, it is possible to improve the busy ratio between the broadcast bus and the global snoop operation in the large-scale information processing device 1. As a result, the information processing device 1 can achieve higher performances.
In a case where an exclusive-type memory access request that involves invalidation of cache data is issued from the CPUs 10 to 13 under its subject node, and the target data of the memory access request is found or not found as a shared type in the cache TAG 52-1, the operation to invalidate the cache data present in the cache memories under the other nodes becomes unnecessary, if the target data of the memory access request is found in the VTAGX 57-1. Accordingly, the global snoop control operation over the nodes in the information processing device 1 is skipped in this case, and an access is made to a main memory under its subject node. Also, an operation to invalidate the cache data present in a cache memory under its subject node can be activated.
Further, when the VTAGX 57-1 is mounted in the system controller 50-1, there is no need to make a noticeable control change in the system controller 50-1 of a present large-scale SMP information processing device. Accordingly, the VTAGX 57-1 can be easily mounted in a present large-scale SMP information processing device.
In a case where the target address information corresponding to the target data of a memory access request is registered in the VTAGX 57-1, advance address information several Kbytes ahead of the target address information is registered in advance in the VTAGX 57-1 under certain conditions, with the continuity of memory accesses according to the program being used. Accordingly, even if an access from the CPUs 10 to 13 under its subject node is the first-time access, the VTAGX 57-1 can be searched. In this manner, even if an access from the CPUs 10 to 13 under its subject node is the first-time access, the global snoop control operation can be skipped, and cache coherency among all the nodes in the information processing device 1 can be guaranteed, as long as there is a hit in the VTAGX 57-1. Thus, the latency of each memory access in the large-scale information processing device 1 can be shortened, and the throughput of each snoop operation can be improved.
The present invention is not limited to the above embodiments, and various changes and modifications may be made to them without departing from the scope of the invention.
For example, a VTAGX is mounted in each of the nodes in the information processing device 1 in the above embodiment. However, the present invention is not limited to that structure, and a VTAGX may be mounted in one or some of the nodes in the information processing device 1.
Also, address information is used as specific information in the above embodiment. However, it is also possible to use any information as specific information by which primary data or cache data can be specified.
Further, a valid bit is used as the state information in the above embodiment. However, it is also possible to use any information indicating whether the subject specific information is in an invalid state or a valid state.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be constructed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2008-069377 | Mar 2008 | JP | national |