The embodiments discussed herein are related to an information processing system.
It is effective in a high-speed parallel processing that an information processing system is constituted of a plurality of nodes connected to each other. By parallel computer has a distributed shared memory, it is possible to perform high-speed parallel computation. Each node of the information processing system includes an arithmetic processing unit (hereinafter, called as CPU (Central Processing Unit)) and a cache memory, etc. The information processing system utilizes the cache memory of each node as the distributed shared memory.
In the distributed shared memory which utilizes the cache memory, since the plurality of nodes share each of the cache memory, it is necessary to control the consistency of the cache memory. The consistency control is a control to maintain the cache coherence. A snoop cache is effective to maintain the cache coherence.
In the snoop cache function, when the CPU of one node performs writing data held in the own cache memory, the another node receives the write data via a shared bus and updates the data in the cache memory of another node. A directory system is utilized as a hardware mechanism to maintain the cache coherence. The directory system holds information indicating which CPU cached the same data in the cache memory, and performs invalidation and updating of the cache line.
A cache management system by the directory registers information that can identify a destination of snoop, such as status, node (board) identifier (ID: Identification), and CPU identifier (ID) in the node (board), when dispatching a request such as a read to one address of the memory request.
The status field 103 indicates a holding status of the data such as an exclusive state (Exclusive), an invalid state (Invalid), and shared status (Shared) with on or two CPU 103. The exclusive state indicates that requestor CPU performs an exclusive control (for example, a state after reading before updating). The invalid state indicates that any CPU is not holding the data. The shared state indicates that a plurality of the CPUs share the data. The CPU-ID fields 104 and 105 is stored the CPU-ID (Identification) that requested (called as a requestor).
For example, when the CPU requests (read requests) the data with the exclusive state (hereinafter referred to as E-state) such as for updating the data, the directory 100 is retrieved with the request address, and the data holding status is determined. When the retrieval of the directory results the data of the request address holds with the shared state (hereinafter referred to as S-state), a snoop is sent to the CPU which holds the data, and the data is updated to the invalid state (hereinafter referred to as I-state). Further, when the requested data is held in the exclusive state, a snoop is sent to the CPU that holds the data, and the corresponding data is updated to the invalid state (I: Invalid state).
In addition, when the CPU requests (read requests) the data with the shared state (S state), the directory 100 is retrieved with the request address, and the data holding status is determined. When the retrieval of the directory results the data of the request address holds with the exclusive state (E-state), a snoop to change the state of the data is sent to the CPU which holds the data. And when the retrieval of the directory results the data of the request address holds with the shared state (S-state), a snoop is sent to the CPU which holds the data, and the requestor CPU-ID is registered in the directory.
Here, the directory format field in
In this way, when the CPU to be registered is more than two, the entry format in the directory 100 is changed A type (as depicted by
Japanese Laid-open Patent Publication No. 2001-101148
Japanese Laid-open Patent Publication No. 2005-044342
Recently, as a large-scale of the information processing system, single node (board) mounts a plurality of CPUs, and the number of system node (board) which is able to connect increases. For this reason, the number of node (or CPU), in which the directory of one node manages, increases.
The amount of information, in which the directory can hold, is a limit to the physical. When the number of nodes or CPUs to hold the data with the shared state (S-state) is increasing, the directory can not store the detailed information to identify the CPU of the snoop destination, because the entry size of the directory mechanism directory is limited.
For example, when three or more CPUs, which hold the data with the shared state, has occurred, the information of the CPU is held by the entry format of B-Type in
Further, in the entry format of B-Type, by changing the holding information to an upper hardware than the CPU, it is possible to increase the number of CPU in question. That is, the CPU is held by only ID of each unit (for example, board ID, which is a unit of the system board). For example, when holding the information on a per system board, it is difficult to identify the CPU in the system board.
Therefore, it is necessary to send the snoop to all the CPUs in the system board, it is difficult to sufficiently focus the snoop destination. In this way, when the CPUs or the nodes, which hold the data in S state, increases, it is necessary that the snoop is dispatches to all CPUs in the system board at a time of dispatching of the request, because the CPU itself can not be identified. As a result, because the amount of communication increases, a decrease in performance is caused.
According to an aspect of the embodiment, an information processing system includes a plurality of nodes, each of which includes at least single arithmetic processing unit, a cache memory that stores data in which the arithmetic processing unit utilizes, and a node controller that retrieves a directory which stores state information whether or not the data stored in the cache memory stores in the cache memory in another node and identification information of another node and communicates a snoop to another node. And the node controller includes a first directory which stores state information whether or not the data stored in the cache memory stores in the cache memory in another node and identification information of another node, and a second directory which stores information to identify shared nodes of the data in a shared state that the data stored in the cache memory stores in the cache memory in the other node.
The object and advantages of the invention will be realized and attained by means of the elements and combinations part particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Hereinafter, the embodiments will be explained in a order of an information processing system according to a first embodiment, data request processing in S state, the data request processing in E state, the information processing system according to a second embodiment, and the other embodiment. However, the information processing system and the directory are not limited to these embodiments.
As depicted by
As depicted by
Returning to
These communication paths 14-1˜14-m constitute a common bus. Instead of separate paths in
The system controller 10 connects to each of the system boards 1-1˜1-n via a management bus 12. The system controller 10 performs a setting of status and monitoring of status of circuits (the CPU, the memory, etc.) on each of the system boards 1-1˜1-n. Furthermore, although not illustrated in
As depicted by
The processing unit 28 connects to the external node interface circuit 20, the CPU interface circuit 20, the directory 22, and the second directory 24. The processing unit 28 searches the directory 22 and the second directory 24, or the like, and transmits the snoop in response to the read/write request from the CPUs 3A and 3B and other nodes.
The node controller 2 utilizes the directory 22 to manage the data. The directory 22 stores the state of the data and management information which node holds same data within the address space of the cache memory in which the own node has.
In this example, the width of one entry in the directory 22 is configured in 2 Byte (=16 bits). Further, the example of
As depicted by
The reserve bit field 22-2 is one bit of a spare bit. The status field 22-3 is composed of two bits. In the status field 22-3, the exclusive state (E state) is indicated by “10”, and the invalid state (I state) is indicated by “00” and the shared state with single CPU (S state) is indicated by “01”, and the shared state with two CPUs is indicated by “11”. The E state indicates that the CPU which requested (called to requester CPU) is in exclusive control. The I state indicates that any CPU do not hold the data. The S state indicates that a plurality of the CPU has shared the data.
The CPU-ID (1) field 22-4 (1) and the CPU-ID (2) field 22-5 of the format type A respectively store the CPU-ID of the CPU (requester) that dispatched the request CPU-ID. The CPU-ID fields 22-4 and 22-5 are composed of 6 bits each. The CPU-ID fields 22-4 and 22-5 store the board (system board) ID of 4 bits and the local ID (CPU-ID in the board) of 2-bits. Therefore, in this example, it can be identified that the number of nodes is up to 16 and the CPU in the node is up to four.
When more than three CPUs are shared state, the format type A is not utilized. When more than three CPUs are shared state, in the directory 22, the entry of the format type A is changed to the entry of the format type B. The second status field 22-6 of the format type B is composed of 3 bits, and set to “111” when three or more CPUs has shared. The board ID bitmap field is consists of 12 bits, and stores the board ID of the CPU (called requester) that was requested in bitmap format. In this example, the nodes can identify up to 12. However, the CPU in the node can not be specified. In other words, it is not possible to store the detail information per the CPU unit.
The extension directory 24 is a dedicated directory that stores the detailed information to identify the CPU that holds the data in the shared state (S state) separately form the directory 22 in
The extension directory 24 has a valid bit field 24-1, memory address field 24-2, and reserve bit field 24-3, and a bitmap filed 24-4 of the CPU-ID. The valid bit field 24-1 is assigned to one bit. The valid bit field 24-1 indicates whether the entry in the extension field 24 is valid (Enable=“1”) or invalid (Disable=“0”).
The extension directory 24 is not be provided for each memory address, and only stores the detail information of the CPU that holds the data in the shared state (S state). Therefore, the extension directory 24 is provided with a memory address field 24-2. The memory address field 24-2 stores upper 25 bits except an index and a cache line in the memory address of shared state. The reserved bit field 24-3 is a spare bit. The bitmap field 24-4 of CPU-ID is composed of 48 bits. Each one bit in the bitmap field 24-4 identifies a single CPU. In this example, it is possible to identify forty eight number of the CPU. In this example, the entry width of the extension directory is 80 bits.
Thus, by setting the extension directory 24 with a format different from the format of the directory 22, it is possible to hold the detailed information of the CPU without increasing the entry width of the directory 22 as depicted by
For example, when the information processing system has a cache memory of 1 Tera Byte, the memory capacity of memory of the directory 22 is a 32 Giga Byte, because each entry in the directory 22 is 2 Byte. When identifying forty-eight number of the CPUs by the entry format of the directory 22, it takes further 36-bit per one entry. Therefore, it is necessary to extend the entry width of the directory 22 beyond 6 Byte (to be precise, 6.5 Byte). For this reason, it is necessary to provide 96 Giga Byte of the directory 22 in order to identify more CPUs.
On the other hand, in the embodiment, since the extension directory 24 stores the data when three or more CPUs share the data, the extension directory 24 only have to target the data in the shared state in the directory 22. Further, in the information processing system, the probability to be shared state is lower than the probabilities of exclusion state and the invalid state. Therefore, it is sufficient that the capacity of the extension directory 23 is from a few Kiro Byte to 1 Mega Byte as a maximum. In other words, it is possible to provide same performance as the directory 22 of 96 Giga Byte by the directory 22 of 32 Giga Byte and the extension directory 24 up to 1 Mega Byte.
For this reason, it is possible to hold the detailed information of the CPU as a minimum increase in the amount of directory. In addition, since it is possible to minimize the number of snoop issuing by using the detailed information of the extension directory 24, it is possible to prevent an increase in traffic.
(Data Request Processing in the S State)
(S10) The CPU 3A (or 3B) dispatches a read request in S state to the node controller 2.
(S12) In the node controller 2, the processing unit 28 receives the read request via the CPU interface circuit 26. The processing unit 28 searches the directory 22 in the node controller 2 by using a read address contained in the read request.
(S14) The processing unit 28 refers to the status field 22-3 of the entries in the directory 22 by the read address, and identifies the information in the status field 22-3. When the status field 22-3 indicates the invalid state (I state), any CPU does not have the requested data. That is, it is a state that any CPU does not require the data of the read address. When the status is determined as invalid state, the processing unit 28 proceeds to step S16.
(S16) The processing unit 28 in the node controller 2 registers the CPU-ID of the CPU that dispatched the request (here, called to the requestor) and the status (S state) to the directory 22.
(S18) The processing unit 28 determines whether the state of the requested data is the exclusive state (E state) by a result of reference to the status field 22-3 of the directory field 22.
(S20) The processing unit 28, when the status is determined to the E state, transmits a snoop to the CPU of the CPU-ID that is registered in the CPU-ID fields 22-4, 22-5 in the directory 22 via the external node interface circuit 20. Snoop transmission requests to change the state of the data to the CPU of CPU-ID which has been registered. Then, the process proceeds to step S16, and the processing unit 28 registers the CPU-ID of the CPU that dispatched the request to the directory 22.
(S22) The processing unit 28 determines whether the state of the requested data is the shared state (S state) by a result of reference to the status field 22-3 of the directory field 22.
(S24) The processing unit 28, when the status is determined to the S state, judges whether the CPU-ID can be registered in the directory 22. As described above, the entry of the A-type in the directory 22 can be registered only two CPU-IDs. The processing unit 28 determines that the CPU-ID can be registered, when the detailed information can be stored in the directory 22 (the format A-Type in
(S26) The processing unit 28, when it is determined not to register the CPU-ID, can not store the detailed information in the directory 22. That is, the entry of A-type in the directory 22 already stored two CPU-IDs. Or the entry format is already changed to a B-Type. The processing unit 28, when it is determined that the CPU-ID can not be registered, determines whether there is a space in the extension directory 24.
(S28) When the processing unit 28 determines that there is free space in the extension directory 24, the processing unit 28 registers the CPU-ID of the requestor to the extension directory 24 in the form of a bitmap. In addition, the processing unit 28 registers the board ID of the requester CPU to the entry of B-Type in the directory 22 in the bitmap format. In this case, when it is necessary to change the entry in the directory 22 from A-Type to B-Type, the processing unit 28 updates the format type 22-1 to B-Type and the status 22-3 to the shared state in the directory 22.
(S30) The processing unit 28, when it is determined there is no free space in the extension directory 24, registers the board ID of the requester CPU to the entry of the B-Type in the directory 22 in the bitmap format.
As illustrated by
The processing unit 28 in the node controller 2 registers the CPU-ID of the CPU that dispatched the request and the status (S state) to the directory 22 (S103). The processing unit 28 determines whether the state of the requested data is the exclusive state (E state) by a result of reference to the status field 22-3 of the directory field 22. The processing unit 28, when the status is determined to the E state, transmits a snoop to the CPU of the CPU-ID that is registered in the CPU-ID fields 22-4, 22-5 in the directory 22 via the external node interface circuit 20 (S104). Then, the process proceeds to step S103, and the processing unit 28 registers the CPU-ID of the CPU that dispatched the request to the directory 22.
The processing unit 28 determines whether the state of the requested data is the shared state (S state) by a result of reference to the status field 22-3 of the directory field 22 (S105). The processing unit 28, when the status is determined to the S state, judges whether the CPU-ID can be registered in the directory 22. The processing unit 28, when it is determined that the CPU-ID can be registered, proceeds the step S103 and registers the CPU-ID of the requester to the directory 22. The processing unit 28, when it is determined that the CPU-ID can not be registered, registers the CPU-ID of the requester to the entry of B-Type in the directory 22 in the bitmap format. In this case, when it is necessary to change the entry in the directory 22 from A-Type to B-Type, the processing unit 28 updates the format type 22-1 to B-Type and the status 22-3 to the shared state in the directory 22 (S106).
In this way, in the embodiment, the extension directory 24 with a different format from the directory 22 is provided only using the S state. And the requester CPU-ID is registered to the expansion directory 24 in the bitmap format. Therefore, it is possible to identify the CPU with the S state with a minimum increase in the capacity of the directory even though increasing the number of the CPU that is installed in the information processing system.
(Data Request Processing in the E State)
(S40) The CPU 3A (or 3B) dispatches a read request in E state to the node controller 2.
(S42) In the node controller 2, the processing unit 28 receives the read request via the CPU interface circuit 26. The processing unit 28 searches the directory 22 in the node controller 2 by using a read address contained in the read request.
(S44) The processing unit 28 refers to the status field 22-3 of the entries in the directory 22 by the read address, and identifies the information in the status field 22-3. When the status field 22-3 indicates the invalid state (I state), any CPU does not have the requested data. When the status is determined as I state, the processing unit 28 proceeds to step S46.
(S46) The processing unit 28 in the node controller 2 registers the CPU-ID of the CPU that dispatched the request (here, called to the requestor) and the status (E state) to the directory 22.
(S48) The processing unit 28 determines whether the state of the requested data is the exclusive state (E state) by a result of reference to the status field 22-3 of the directory field 22.
(S50) The processing unit 28, when the status is determined to the E state, transmits a snoop to the CPU of the CPU-ID that is registered in the CPU-ID fields 22-4, 22-5 in the directory 22 via the external node interface circuit 20. Snoop transmission requests to change the state of the data to the CPU of
CPU-ID which has been registered. Then, the process proceeds to step S46, and the processing unit 28 registers the CPU-ID of the CPU that dispatched the request to the directory 22.
(S52) The processing unit 28 determines whether the state of the requested data is the shared state (S state) by a result of reference to the status field 22-3 of the directory field 22.
(S54) The processing unit 28, when the status is determined to the S state, judges whether the CPU-ID which has been registered in the directory 22 is less than two. As described above, the entry of the A-type in the directory 22 can be registered only two CPU-IDs. When the processing unit 28 determines that the CPU-ID which has been registered is less than two, the processing unit 28 transmits the snoop to the CPU of the CPU-ID that is registered in the CPU-ID fields 22-4, 22-5 in the directory 22 via the external node interface circuit 20. Then, the process proceeds to step S46, and the processing unit 28 updates the directory 22. That is, when single CPU-ID is registered in the directory 22, the processing unit 28 registers the CPU-ID of the CPU that dispatched the request to the directory 22. And when two CPU-IDs are registered in the directory 22, the processing unit 28 updates the entry in the directory 22 from the A-type to the B-type. That is, the processing unit 28 updates the format type field 22-1 to B-type and the status field to E state and registers a first board ID which mounts the CPU of the CPU-ID that has been already registered and a second board ID which mounts the CPU of CPU-ID to register at a present time in the directory 22 in the form of bitmap.
(S56) The processing unit 28, when it is determined that the CPU-ID, which has been registered, is not less than two, searches the extension directory 24 by the read address.
(S58) The processing unit 28 determines whether or not corresponding address to the read address of the request exists in the address field 24-2 of the extension directory 24 (called as HIT determination).
(S60) The processing unit 28, when determining that the corresponding address to the read address of the request exists in the address field 24-2 of the extension directory 24 (the HIT determination), transmits a snoop to the CPU of the CPU-ID that is registered in the bitmap field 24-4 of the CPU-ID in the extension directory 24 via the external node interface circuit 20.
(S62) After the processing unit 28 transmits the snoop, the processing unit 28 registers the CPU-ID of the requester to the bitmap field 24-4 of the CPU-ID in the extension directory 24 in the form of bitmap. In addition, the processing unit 28 registers the board ID of the CPU-ID of the requester to the entry of the B-type in the directory 22 in the form of bitmap. Further, the processing unit 28 updates the status field 22-6 in the directory 22 to E-state.
(S64) The processing unit 28, when determining that the corresponding address to the read address of the request does not exist in the address field 24-2 of the extension directory 24, transmits the snoop to the board of the board-ID that is registered in the entry of the B-type in the directory 22 via the external node interface circuit 20. And the processing unit 28 registers the board ID of the CPU-ID of the requester to the entry of the B-type in the directory 22 in the form of bitmap and updates the status field 22-6 in the directory 22 to E-state.
The CPU 3A (or 3B) dispatches a read request in E state to the node controller 2 (S110). The processing unit 28 searches the directory 22 in the node controller 2 by using a read address contained in the read request. The processing unit 28 determines whether the status field 22-3 of the entries in the directory 22 by the read address indicates the invalid state (I state) (S112). When the status is determined as I state, the processing unit 28 proceeds to step S113 and registers the CPU-ID of the CPU that dispatched the request and the status (E state) to the directory 22 (S113).
The processing unit 28 determines whether the state of the requested data is the exclusive state (E state) by a result of reference to the status field 22-3 of the directory field 22 (S114). The processing unit 28, when the status is determined to the E state, transmits a snoop to the CPU of the CPU-ID that is registered in the CPU-ID fields 22-4, 22-5 in the directory 22 via the external node interface circuit 20. The snoop transmission requests to change the state of the data to the CPU of CPU-ID which has been registered. Then, the process proceeds to step S113, and the processing unit 28 registers the CPU-ID of the CPU that dispatched the request to the directory 22 (S115).
The processing unit 28 determines whether the state of the requested data is the shared state (S state) by a result of reference to the status field 22-3 of the directory field 22 (S116). The processing unit 28, when the status is determined to the S state, judges whether the CPU-ID which has been registered in the directory 22 is less than two (S117). When the processing unit 28 determines that the CPU-ID which has been registered is less than two, the processing unit 28 transmits the snoop to the CPU of the CPU-ID that is registered in the CPU-ID fields 22-4, 22-5 in the directory 22 via the external node interface circuit 20 (S115). Then, the process proceeds to step S113, and the processing unit 28 updates the directory 22. That is, when single CPU-ID is registered in the directory 22, the processing unit 28 registers the CPU-ID of the CPU that dispatched the request to the directory 22. And when two CPU-IDs are registered in the directory 22, the processing unit 28 updates the entry in the directory 22 from the A-type to the B-type. That is, the processing unit 28 updates the format type field 22-1 to B-type and the status field to E state and registers a first board ID which mounts the CPU of the CPU-ID that has been already registered and a second board ID which mounts the CPU of CPU-ID to register at a present time in the directory 22 in the form of bitmap (S113).
The processing unit 28, when it is determined that the CPU-ID, which has been registered, is not less than two, transmits the snoop to the CPU or the board that is registered in the bitmap field 22-7 of the board-ID in the entry of the B-type in the directory 22 via the external node interface circuit 20. And the processing unit 28 registers the CPU-ID of the requester or the board ID to the entry of the B-type in the directory 22 in the form of bitmap, and updates the status field 22-6 in the directory 22 to E-state (S118).
In this way, in the embodiment, the extension directory 24 with a different format from the directory 22 is provided only using the S state. And the requester CPU-ID is registered to the expansion directory 24 in the bitmap format. Therefore, it is possible to identify the CPU with the S state with a minimum increase in the capacity of the directory even though increasing the number of the CPU that is installed in the information processing system.
Therefore, it is possible to focus the snoop destination and to reduce traffic, even though the cache shared memories 4A, 4B in the system board (node) 1-1˜1-n are used as a shared cache memory. In particular, it is possible to identify the CPU of snoop when issuing the request and to reduce traffic even though increasing the CPU and node that holds the data in the S state. Thereby, it contributes to improved performance.
As depicted by
The first memory 4 constitutes the L2 cache memory. The second memory 5 constitutes the L3 cache memory. The first and second memories 4 and 5 may be used DIMM (Dual Inline Memory Module), for example. The node controller 2 performs communication between the system boards 1-1 to 1-4. In this example, the node controller 2 on the first system board 1-1 connects to the node controller 2 on the second system board 1-1 through a first communication path 14-1. In addition, the node controller 2 on the second system board 1-2 connects to the node controller 2 on the third system board 1-3 through a second communication path 14-2. Below, in the same way, the node controller 2 on the third system board 1-3 connects to the node controller 2 on the fourth system board 1-4 via a third communication path 14-3.
The system controller 10 performs a setting of status and monitoring of status of circuits (the CPU, the memory, etc.) on each of the system boards 1-1˜1-4. The system controller 10 provided to each of the system boards 1-1˜1-4 connects each other via the management bus 12. Furthermore, each system controller 10 notifies the operational status of each system boards and monitors the status of the other system boards via the management bus 12.
Further, the node controller 2 includes the directory 22 and the extension directory 24 in a memory space including the additional cache memory 5, as same as the configuration in
Further, since the system controller 10 is provided to each of the system boards 1-1 to 1-4, as compared to the first embodiment, it is possible to reduce the load on the system controller. It is possible to focus the snoop destination in the shared state and reduce the traffic even in the information processing system in which expansion of the cache memory is easy, similarly to the first embodiment.
In the embodiment described above, single node has single system board. However, single node may has a plurality of system boards and a plurality of nodes may has single system board. Although it is described that the number of the CPUs which equipped with the system board is two, three or more CPUs may be mounted on single system board.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2010/061785 filed on Jul. 12, 2010 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2010/061785 | Jul 2010 | US |
Child | 13738433 | US |